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(Copies of the following references are submitted along with the instant response) 

1 . Widespread occurrence of three sequence motifs in diverse S-Adenosylmethionine- 
dependent methyltransferases suggests a common structure for these enzymes. Archives 
of Biochemistry and Biophysics 310 (2), 417-427, 1994 (common structure was found in 
69 methyltransferases). 



2. Crystal structure of the chemotaxis receptor methyltransferase CheR suggests a 
conserved structural motif for binding S-Adenosylmethionine. Structure 5, 545-558, 
1997 (protein methyltransferase was found to share the same catalytic domain with DNA, 
RNA and small molecule methyltransferases). 

3. Structure of PvuII DNA-(cytosine N4) methyltransferase, an example of domain 
permutation and protein fold assignment. Nucleic Acids Research 25 (14), 2702-2715, 
1997 (DNA- methyltransferase Pvwll had identical binding domains for SAM as did M- 
Taql andM-Hhal). 
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ERRATUM 

Volume 310, Number 2 (1994), in the article "Widespread Occurrence of 
Three Sequence Motifs in Diverse S-Adenosylmethionine-Dependent Methyl- 
transferases Suggests a Common Structure for These Enzymes," by Ron M. Ka- 
gan and Steven Clarke, pages 417-427: On page 425, the sequence in Table III 
identified as E. coli AdoMet decarboxylase (and discussed on page 422 of the 
text) is in fact that of spermidine synthase, encoded by the E. coli speE gene 
(GenBank Accession No. J02804). Sequence analysis of E. coli AdoMet decar- 
boxylase does not demonstrate any of the three methyltransferase motifs as 
stated and thus lessens the chances that the motif Ill-like sequences of the hu- 
man, rat, and hamster AdoMet decarboxylases described in Table III are signifi- 
cant. Therefore, it does not appear that any AdoMet decarboxylase shares se- 
quence motifs with the group of methyltransferases described in the paper. On 
the other hand, it is now clear that spermidine synthase,- from both E. coli and 
human (Genbank Accession No. M64231 ) sources, does have the three sequence 
motifs characteristic of many 5-adenosylmethionine-dependent methyltransfer- 
ases. This enzyme catalyzes propylamine transfer from decarboxylated S-aden- 
osylmethionine to putrescine. Interestingly, both spermidine synthases share se- 
quence similarity to a recently identified S-adenosylmethionine-dependent pu- 
trescine N-methyltransferase (Genbank Accession No. D28506). The authors 
thank Professor Anthony Pegg (Pennsylvania State University) for helping 
bring these facts to their attention and regret any confusion caused by their 
error. 
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Three regions of sequence similarity have been re- 
ported in several protein and small-molecule S-adeno- 
sylmethionine-dependent methyltransferases. Using mul- 
tiple alignments, we have now identified these three 
regions in a much broader group of methyltransferases 
and have used these data to define a consensus for each 
region. Of the 84 non-DNA methyltransf erase sequences 
in the GenBank, NBRF PIR, and Swissprot databases 
comprising 37 distinct enzymes, we have found 69 se- 
quences possessing motif I. This motif is similar to a con- 
served region previously described in DNA adenine and 
cytosine methyltransferases. Motif II is found in 46 se- 
quences, while motif III is found in 61 sequences. All 
three regions are found in 45 of these enzymes, and an 
additional 15 have motifs I and III. The motifs are always 
found in the same order on the polypeptide chain and are 
separated by comparable intervals. We suggest that these 
conserved regions contribute to the binding of the sub- 
strate S-adenosylmethionine and/or the product S-ad- 
enosylhomocysteine. These motifs can also be identified 
in certain non methyltransferases that utilize either S- 
adenosylmethionine or S-adenosylhomocysteine, includ- 
ing S-adenosylmethionine decarboxylase, S-adenosyl- 
methionine synthetase, and S-adenosylhomocysteine 
hydrolase. In the latter two types of enzymes, motif I is 
similar to the conserved nucleotide binding motif of pro- 
tein kinases and other nucleotide binding proteins. These 
motifs may be of use in predicting methyltransferases 
and related enzymes from the open reading frames 
generated by genomic sequencing projects. © iron .Academic 

Press, Inc. 



V 

Of the 3196 enzymes described in the latest version of 
Enzyme Nomenclature (1), about 3% represent species 
that catalyze the attack of a variety of nitrogen, oxygen, 
carbon, and sulfur nucleophiles on the methyl group of 
S-adenosylmethionine. These methyltransferases include 
enzymes that result in the formation of methyl ester, 
methyl ether, methyl thioether, methyl amine, methyl 
amide and other derivatives on proteins, nucleic acids, 
polysaccharides, lipids, and various small molecules. We 
have been interested in the possibility that the similarity 
in the simple catalytic chemistry of these reactions is re- 
flected in amino acid sequence and three-dimensional 
structural similarities of the enzymes. Since all of these 
enzymes bind the methyl donor AdoMet 2 and produce 
Ado Hey as a product, a similar binding pocket in these 
enzymes may be reflected in similar amino acid sequences. 
Such a situation is found, for example, in the GTP-binding 
proteins which possess three short sequence motifs with 
distinct spacing that interact with GTP (2, 3). 

The first indications of sequence similarities between 
different types of methyltransferases were shown in en- 
zymes that modify DNA (4, 5). For example, DNA. m 5 C 
methyltransferases have 10 regions of sequence similarity, 
two of which are also shared with DNA m 6 A methyltrans- 
ferases (5, 6). It was later shown that at least one of these 
motifs (termed motif I in this work) is also shared with 
several RNA, protein, and small molecule methyltrans- 
ferases and with AdoHcy hydrolases (7). Two additional 
sequence motifs (termed motif II and motif III in this 
work) are found in a number of protein and small molecule 
methyltransferases, in at least one tRNA methyltrans- 



2 Abbreviations: AdoMet, S-adenosyl-L-methionine; AdoHcy, S-ad- 
1 To whom correspondence should be addressed. e n osy I -L- homocysteine. 
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KAGAN AND CLARKE 
TABLE I 

Methy transferase Sequences Examined in This Study 





Enzyme 


Gene a 


EC Number 


Organism 


Motifs 


D Accession c 


Protein carboxyl MTases 


Isoaspartyl O-MT 


PIMT 


21.1.77 


human 


1,1 1, m 


A3S404 




Isoaspartyl O-MT 


PIMT 


2.1.1.77 


bovine 


I.IIJII 


A34242 




Isoaspartyl O-MT 


PIMT 


2.1.1.77 


mouse 


i.n.ni 


M60320 




Isoaspartyl O-MT 


PIMT 


• 21.1.77 


rat 


1.11,111 


D11475 




lsoaspartyl l>M 1 


PIMT 


2.1.1.77 


wheat 


Mi.ni 


L07941 




isoaspanyi ^/-m i 


pem 


21. 1.77 


E. coli 


i.ii.ni 


M63493 




f-\jiinamyi u-iyi i 


cheR 


21.1.80 


E. coli 


i.ii.iii 


Ml 3463 




jpvJ |Ul alliy I W 1Y| J 


cheR 


21.1.80 


S. typhimurium 


i.ii.iii 


J02757 




V-filulamvl O-MT 


frzf 


2.1.1.80 


Af. xanthus 


1,11,111 


M3520U 






STEJ4 


21.1.100 


yeast 


nd 


L 15442 


Sin&ll molecule 0*M(7&ses 




HOMT 


2.1.1.4 


human 


i ri in 
1,11,111 


M83779 






HOMT 


21.1.4 


bovine 


I II Til 
ljl r lll 


J0267 1 




rti-Ci yi5CILHvJUill Vw/ i» 1 1 


HOMT 


21.1.4 


chicken 


1, 11,111 


X 62309 




Parrrhnl O-MT 


COMT 


21.1.6 


human 


1 Tf III 
1,11,111 


M652I2 




PsitM*tw\l O-MT 


COMT 


21.1.6 


rat 


1,11,111 


M60754 




f"aff>ir» ariH H.MT 
l~a_ll Gil* OI.IU Lr*JVl 1 


(CAOMT) 2. L 1.68 


maize 


1,11,111 


M73235 




uancjc acia 1>M 1 


(CAOMT) 21.1.68 


alfalfa 


I.II.III 


M63853 




Laneic acid OM I 


(CAOMT) 21.1.68 


aspen 


i.ii, m 


X62096 




Caneoyi LoA O-MT 


(CCOMD 2.1.1.104 


parsley 


UI.IH 


M69184 




O-De methyl puromycin O-MT 


dmpM 


21.1.38 


S. alboniger 


i.iijii 


M74560 




Hydroxyneurosporene O-MT 


crtF 




R. capsulatus 


i.ii, in 


S04408 




Myo-inosito! O-MT 


Imtl 


2.1.1.39/40 


Af. crystallinum 


MMIl 


M87340 




Carnu n omy ci n O-MT 


dnrK 




S. peucetius 


I.IUII 


L13453 




Tetracenomyrin 3-O-MT 


tcmN 




S. glaucesens 


uun 


M80674 




Tetracenomyrin 8- O-MT 


tcmO 




S. glaucesens 


UUII 


M80674 




Midamycin O-MT 


mdmC 




S. mycarofaciens 


uun 


M93958 




Erythromycin biosynthesis O-MT 


eryG 




S. erythraea 


UUII 


S18533 


small molecule rV-M J ases 


Phenylethanolamine N-MT 


PNMT 


21.1.28 


human 


UUII 


J03280 




rnenyletnanolarrune N-MT 


PNMT 


21.1.28 


bovine 


UUII 


M36706 




Phenylethanolamine N-MT 


PNMT 


2.1.1.28 


rat 


UUII 


X14211 




Phenylethanolamine N-MT 


PNMT 


21.1.28 


mouse 


1,11,111 


L12687 




oiycine in-m i 


GNMT 


21.1.20 


rabbit 


UUII 


D13307 




oiycine in-mi 


GNMT 


21.1.20 


P>8 


UUII 


D13308 




uiyciDe p*-rvi i 


GNMT 


2.1.1.20 


rat 


UUII 


X06150 




vjuaiuujuOaccuuc pt-tvi i 


(GANMT) 2.1.1.2 


rat 


1.11,111 


J03588 




niAuuiuiic in-ivi i 


HNMT 


21.1.8 


rat 


1,11,111 


D10693 




Diohthamirfe N-MT 


DPH5 


2.1.1.98 


yeast 


nd 


M83375 


Small molecule S-MTase 


Thioether S-MT 


(TSMT) 


21.1.96 


mouse 


t IT III 

1,11,111 
I 


\ moo /in a 
M 8 8694 


Porphyrin precursor C-MTases 


Precorrii>-2 MT 


cobl 




P. denitrificans 


M59301 




Precorrin-3 MT 


cobF 




P. denitrificans 


T Til 

1,111 


M59301 


\ 


Ptecorrin-3 MT 


cobJ 




P. denitrificans 


1 Til 

1,111 


M59301 




Precorrin-3 MT 


cobL 




P. denitrificans 


UUII 


M5930I 




Precorrin-3 MT 


cobM 




P. denitrificans 


1,111 


M5930I 




I IrntYirnhvrinnDPn Til \A I 


cobA 


2.1.1.107 


B. megatarium 


UUH 


M6288 1 




uruporpfiynuogea ill ivi i 


UMT 


21.1.107 


M. ivanoYvu 


I, III 


M62874 




Uroporphyrinogen HI MT 


UMT 


21.1.107 


Pseudontonas sp. 


I III 






Uroporphyrinogen III MT 


cysG 


2.1.1.107 


E. coli 


uun 


PI1Q9? 




Uroporphyrinogen III MT 


cysG 


2.1.1.107 


S. typhimurium 


UUII 


P??924 




Magnesium protoporphyrin MT 


bchH 


21.1.11 


R. capsulatus 


nd 


M74001 


Lipid MTases 


DHPBO-MT 


COQ3 


21.1.64 


rat 


UUH 


L20427 




DHHB O-MT 


COQ3 


2.1.1.64 


yeast 


UUII 


M73270 




UhiGO-MT 


ubtG 




E. coli 


UUII 


M87509 




Phosphatidyl ethanolarnine MT 


PEM} 


21.1.17 


yeast 


nd 


M16987 




Phosphandylethanol amine MT 


pmtA 


21.1.71 


R. sphaeroides 


UUII 


L07247 




Phospholipid MT 


PEM2 


21.1.71 


yeast 


nd 


M16988 




Cyclopropane fatty acid synthase 


cfa 


21.1.79 


E. coli 


uun 


M98330 



0 Nonstandard gene designations in parentheses. 
6 nd, not detected. 

'Accession numbers are from the Genbank/EMBL release 77 (Roman type). NBRF P1R release 36 (italicized type), and Swissprot release 24 
(underlined). 
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Enzyme 


Gene* 


EC Number 


Organism 


Motifs b 


Accessia 


DMA IUTTocm tRNA II MT 


trmA 


2.1,1.35 


E colt 


nd 


M575fiK 

1*1 J f.'UO 




trtnD 


2.1-1-31 


Ecoli 


II, III 


AUl Ol 0 


liVM t\ Pi IN£ U IVI 1 


trml 


11 ill 

L. I. 


yeast 


nil 
DO 


[VII / 17 J 


rKIN A Ij (VI J 


%rm 


9 1 1 SI/59 




no 






grm 




S. tcncbmrius 


no 


c/7777 


FKJN A U iVl 1 


grm 


9 1 1 51/59 


Af. rosea 


nd 


k <i c c c-j ■ 

IVljJjil 


-dma \jrr 
itin a ivi i 


sgm 




S. zionensis 


_j 
na 




rRNA MT 

1 IVi " /■ ITl I 






V /inr/t/itpiif if 

•J. UTHrtJWCWiS 


III 




rRNA MT 


kofnB 




V f #n^ntvi rVii * 
■J. WT/icC/rCif 


nd 


M64625 


rRNJA MT 
livit /\ IVl | 






£ hirsuto 


na 


M/vdA9A 
IVIO*Kj^D 


rRNA MT 
ilVMrY IVl 1 


c^arb 




S. thct nxololc rons 


i 
J 


ivi iojuj 


rRNA N6 A MT 


/mi 


Z1J.48 


«J. H rl UUf AJ 


| 


JS0635 


rRNA N6 A MT 


ERMA 


2.1.1.48 


v-. utfsmrtcf (uc 


I HI 


X5 1 472 


rRNA N6 A MT 




2.1.1.48 




MH 


A25101 


rRNA N6 A MT 


ermBC 


2.1.1.48 




1 III 


B27739 


rRNA N6 A MT 


ermBP 


2.1.1.48 




I III 


M77169 


rRNA N6 A MT 




2.1.1.48 




[ 


M 19652 


rRNA N6 A MT 


emCd 


2.1.1.48 


V. ifi f/fUflC f MAC 


[ 


IVi JO/ 


rRNA N6 A MT 


ermD 


2.1.1.48 




I v 




rRNA N6 A MT 


crmE 


2.1.1.48 


5. crythoeus 


I 


M11200 


rRNA N6 A MT 


ermF 


Zl.1.48 


B. fragilis 


I 


A251S7 


rRNA N6 A MT 


ermG 


11.1.48 


B. sphaericus 


I.III 


M15332 


rRNA N6 A MT 


ermJ 


2.1.1.48 


B. anthracis 


I 


L08389 


rRNA N6 A MT 


ermK 


2.1.1.48 


B. licheniformis 


I 


B42473 


rRNA N6 A MT 


crmM 


Zl.1.48 


S. epidermutis 


Mil 


A24497 


rRNA N6 A MT 


emrR 


2.1.1.48 


L reuteri 


Mil 


M64090 


rRNA N6 A MT 


crmSF 


2.1.1.48 


S. fraediae 


Mil 


Ml 9269 


rRNA N6,N6 A MT 


ks R A 




E. colt 


I 


X06536 



ferase, and in AdoHcy hydrolase (7). Other studies have 
added examples to motif I (8-12) and motif III (11, 12). 

To more clearly establish the limits of using these mo- 
tifs to characterize methyltransferases we have now ex- 
tended our analysis of methy transferase sequences. We 
find that 45 of 84 available methy Itransferase sequences 
contain all three motifs spaced at similar intervals, and 
an additional 15 enzymes possess motifs I and III but 
appear to lack motif II. We suggest that these sequence 
motifs may represent core elements of the polypeptide 
that are brought together in the three-dimensional struc- 
ture to interact directly with this cof actor. Thirteen of 
the 84 enzymes do not appear to contain any of these 
motifs. These latter enzymes may have simply diverged 
too greatly to identify the motifs or may represent an 
independent approach to AdoMet or AdoHcy utilization. 
We have also discerned intriguing sequence similarities 
between motifs I and III in methyltransferases, AdoMet 
synthetases, AdoHcy hydrolases, and AdoMet decarbox- 
ylases. We speculate that these similar motifs may com- 
prise part of an evolutionarily conserved AdoMet and 
AdoHcy binding structure in these classes of enzymes. 

METHODS 

Multiple sequence alignments were carried out using the Megalign 
program from DNASTAR (Madison, WI). Multiple alignments are cre- 



ated according to the CLUSTAL V method (13). The initial parameters 
used to construct the alignments were as follows: a k- tuple of 1 or 2, a 
window of 5 residues, and a gap penalty of 3 were applied to the pairwise 
alignments. A gap penalty of 10 and a gap length penalty of 10 or 20 
were applied to the multiple alignments. All alignments were evaluated 
using the PAM 250 distance matrix (14). Two-way protein sequence 
alignments were carried out in Megalign acording to the Lipman-Pearson 
method (15). 

The consensi for each motif was determined by computing the amino 
acid frequency at each position as described in the legend to Fig. 1. The 
frequencies were divided by the Dayhoff frequency for each amino acid 
(14) and amino acids that scored 2.5 or greater were chosen for the 
consensus. We did not apply this standard to tryptophan residues because 
the Dayhoff frequency of tryptophan is only 1% and a single tryptophan 
residue in a group of 26-29 distinct enzymes would score above 2.5 by 
this method. 



RESULTS 

Excluding motif I, most DNA methyltransferases do 
not appear to have conserved sequence blocks in common 
with protein, lipid, and small-molecule C-, 0-, and S- 
methyltransferases. As the conserved sequence motifs of 
DNA methyltransferases have been fully described else- 
where (4, 5, 16), we chose to focus here on the sequence 
motifs in enzymes active on substrates other than DNA 
(Table I). We selected the 84 sequences presented in Table 
I from searches of the translated GenBank database re- 
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KAGAN AND CLARKE 
TABLE II 

Methyl transferase Sequence Motifs I- III 



Motif I 



IVEV C P 
VLDIGGGTG 



81 
81 
82 
82 
87 
79 



AJJ2Y£SGS£ 
A LDVG SGSG 
APfiMfiSfiSi 

YlEiGTGSfi 
138 ADTL£TA£fi 
138 ADALfiMARfi 
127 LA£LfiALSL 
b 

183 ICf2L£££S£ 

184 IYr>L£fi£G£ 
112 JLL£LfiAYCH 
62 yilLfiAYCfi 
205 LVDVGGG VG 
204 UPVQQGTG 

204 LVPVGGGTQ 

77 TMEIGVYTG 
208 VVDIGG ADG 
232 ttMDVGGGTG 

205 IVPVG GWI6 
183 YIPYGGQK& 
331 IAHL£££D£ 
173 F VDLGG ARG 

VLEIG TFTG 
YiJlttfiF£Lfi 

LIDIGSGPT 
VLDV A CGTG 
VLDV A CGTG 
VLDV A CGTG 
VLEVGF£MA 
JlSlGlifiAfi 



64 
85 
75 
75 
51 
86 
59 
58 
61 
64 
56 



60 <,LIGLGS£ET 
7 GRLIGVGTG 
9 IU1£I£1G 
5 LYVVGTGPG 
265 LWJUfiGfiSfi 
VHFIGAGPG 
VYLVGAGPG 
VYLVGAGPG 
VWLVGAGPG 
218 VVLVGAGPG 
218 VVLVGAGPQ 
91 HJffii££G£ 
130 VLDVGCG GG 
76 Y IPYG CGGfi 
42 VLEVfr VGTG 
171 mil^Wfi 



3 
4 
3 

17 



Motif II 



Motif III 



Gene 



Organism 



A Y L 
GTY VIV 
PQFDAIFC" 

150 AP YDAI HV 

150 AP YDAI HV 

151 AP YDAI HV 
151 AP YDAI HV 
157 AP YDAI HV 
140 AP PPA II V 
222 GP FDAIFC 
222 &PFPAIFC 
209 SSLflLIU: 
271 PEADLYIL 

243 PEADLYIL 

244 PE ADLYIL 

183 DILfiMY£L 
133 DTLDMVFL 
258 PAG DAIL M 
257 EKAPAYFM 

257 PK ADAVF M 
149 GTFD FVFV 
269 GGGDLYVL 
288 QGAQVIT.L 

258 PQAPAIFM 
244 RK APA HL 
393 TGYPAYLF- 
234 PRAfiVFIV 
135 GAFDLVFV 
149 EIEflRYTS 
173 LPAfiALYS 
173 LPAMLYS 
149 LPACALYS 

184 LPAfiALVS 
129 GGFPAVIC 

128 GGFDAVK 

129 DGFPAYK 
127 fiH£fiGULY 
135 EKDtfiFIHM 
156 £LAJBCyjLT 



328 EflPPAIFI 
47 SDMIIIC 



261 RDADRVFV 
261 RDADRVFV 
155 ECfJ2A*YA 
192 GOFP IITC 
137 GQYD VVTC 
103 ETFD TVVA 
227 DQFD RIVS 
66 PLRpAiHA 



V 

L I 
K IIFL 
LLRPGGRLLI 

171 O LKPGGRLIL 
171 0LKPGGR1 II 

171 O LKPGGRLIL 

172 QLKPGGRLIL 
178 QLKPGGRMYI 
161 QiDE££IiXL 
251 LLKP D GLLF A 
251 LLKP D GLLF A 
238 A LRPGGLLFL 
300 TCKE££GIiV 

272 ACRT£fiGXLV 

273 AC RPGG GVLL 
209 LlfiK£TVHA 
159 LLEKfiTVJ.i.A 
288 LPENfiKVlYV 
28 7 LPDN£KVIYA 
287 LPEN£KVILV 

173 LVKI GGLI GY 
298 AMPAHAB11V 
317 ALPRftiBlII 
287 SIAK££KIH 
273 A1E£JG£R1LI 
423 IGDDDARLL1 
263 ALTPGGAVLV 
159 LV RPGGL VAI 
176 VLKPGGVLAI 
204 LLRPGG HLLL 
204 LLRPGGHjJJL 
180 L LRPGG HI t I 
215 URPGGHliL 
164 MV£T££U^I 

163 MVSSGGI IVT 

164 MVRPQGLL Y I 
159 UKR££LLTY 
156 1KFFHGJJLAA 
187 LLKPSGHiYT 
112 AFLVWfiDPML 
106 AVLSEfiDPLF 

76 CMVSfifiDPGV 
351 AUSG GRIV A 

77 ARLHSfiDlSV 
TRLK££DPFV 
VRLK£fiDP£V 
LRLK^DPfV 

297 VRLK£GDP£1 
297 VRLK££DP£I 
182 V LKPGG S LFI 
220 LNPEK£1L£L 
164 LV KPGG DVFF 
130 VC£K££EVyi 
256 N]J£RE£IFLL 
101 ELATNQKLLL 
167 PSVD££IL¥I 



83 
81 
96 



Consensus? 

PIMT 
P1MT 
PIMT 
PIMT 

PIMT 

pent 

cheR 

cheR 

fnf 

HOMT 

HOMT 

HOMT 

COMT 

COMT 

(CAOMT) 

(CAOMT) 

(CAOMT) 

(CCOMT) 

dmpM 

crtF 

imtl 

dnrK 

tcmN 

tcmO 

mdmC 

cryG 

PNMT 

PNMT 

PNMT 

PNMT 

GNMT 

GNMT 

GNMT 

(GANMT) 

HNMT 

(TSMT) 
cobl 
cobF 
cobJ 
cobL 
cobht 
cobA 
UMT 
UMT 
cysG 
cysG 
COQ3 
COQ3 
ubiG 
pmtA 
cfa 
trmD 
ImrB 



human 
bovine 
mouse 
rat 

wheal 

E. coli 
E. coli 

S. typhimurium 

M. xanthus 

human 

bovine 

chicken 

human 

rat 

maize 

alfalfa 

aspen 

parsley 

S. alboniger 

R. capsulatus 

M. crystallinum 

S. peucetius 

S. glaucesens 

S. glaucesens 

S. mycarofaciens 

S. erythraea 

human 

bovine 

rat 

mouse 
rabbit 

P»B 
rat 
rat 
rat 

mouse 

P. denitrificans 
P. denitrificans 
P. denitrificans 
P. denitrificans 
P. denitrificans 
B. megatarium 
M. ivanowii 
Pseudomonas sp. 
E. coli 

S. typhimurium 

rat 

yeast 

E. coli 

R. sphaeroides 
E. coli 
E.coli 

S. lincolnensis 



The consensus at each pos.tion is denned as the residue(s) present at a frequency of 2.5 times or more of the natural abundance for each 
amino acid. The first consensus row immediately above the sequences represents the most common choice for that position above the cutoff. The 
second, third, and fourth most common residues at each position above the cutoff are in the rows above! 

Human HOMT has a 26-residue LINE 1 insertion in motif I, and it was not included in this table. 
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TABLE II— Continued 





Motif I 


Motif II 




Motif III 


Gene 


Organism 


78 


Vt FVGAGNG 








carb 


S. tkermotolerans 


39 


L LEVGAGNG 








Irm 


S. Hvidaru 


78 


YLEYGAfiNfi 




169 


PNVD££HYI 


ERMA 


C diphtheriae 


36 


IIEIGPGSG 




167 


PSVDSVLIVL 


crmA 


S. aureus 


34 






166 


PRVNSVUKk 


ermBC 


E. coli 


33 


VYEIGTGKG 




166 


PKVNSVLIKi 


ermBP 


C. perfringens 


33 


VYEIGTGKG 








crmC 


S. aureus 


36 


IILUiPfiSfi 








ermCd 


C. diphtheriae 


35 










ermO 


B. ticheniformis 


48 


VLELCAGKG 








ermE ' 


S. erythaeus 


65 


VLEAGPGEG 








crmF 


fi* fragdis 


37 


VLDIGAGKG 




167 


PKVDSAUYL 


CTTTtG 


B. sphaericus 


55 


VLELGAGKG 








ermJ 


B. anthracis 


48 


¥L£L£A£KS 








crmK 


B. Ucheniformis 


34 


IFEIGSGKG 




167 


PKVNSSL1RL 


ermM 


S. epidcrmidis 


53 






186 


PRVNSSLIVL 


emrR 


L reuteri 


89 


LL£Y£AfiRfi 




226 


PRVDSG1LRI 


ermSF 


S. fraediae 


41 


MYLLfiPfiLA 








ksgA 


E. coli 

\ 



leases 76 and 77, the NBRF-PIR database version 36, and 
from Swissprot release 24 using the keywords "methyl- 
transf erase" or "methylase." We eliminated duplicate 
entries, partial sequences, and putative sequences. These 
sequences represent 37 distinct enzymatic activities; 3 
protein carboxyl methyltransferases, 12 small-molecule 
O-methyltransferases, 5-small molecule N-methyltrans- 
ferases, 1 small-molecule S-methyltransferase, 4 porphy- 
rin precursor methyltransferases, 5 lipid methyltransfer- 
ases, and 7 RNA methyltransferases. 

Methyltransferase motif L Methy transferase motif I 
has been previously described in DNA methyltransferases 
(4, 5, 16, 17) as well as in a variety of bacterial RNA 
methyltransferases and in procaryotic and eucaryotic 
small-molecule and protein methyltransferases (7-12). 
We have now extended our analysis to an expanded se- 
quence database of 84 methyltransferases and found this 
region to occur in 69 sequences representing 29 distinct 
enzymes (Tables I and II). We define motif I as a nine- 
residue block with the consensus sequence (V/I/L)(L/ 
V)(D/E)(V/I)G(G/C)G(T/P)G as shown in Table II and 
Fig. 1A. The amino acids at each position of the consensus 
were found to be present at a frequency of 2.5 times or 
more than that of their Dayhoff frequency (see Methods). 
The glycine residue at position 5 is present in all of the 
sequences except the glycine methyltransferases where it 
is substituted by an alanine residue. The glycines at po- 
sitions 7 and 9 in this motif are present in 59 and 62 of 
the sequences, respectively. This consensus is in agree- 
ment with the general consensus given previously for mo- 
tif I, hh(D/E)hGXGXG, where "h" represents a hydro- 
phobic residue (9). The DNA methyltransferases lack the 
glycine at position 5 and often have a phenylalanine at 



this position (6). The consensus for this region in a num- 
ber of m 5 C and m 6 A DNA methyltransferases has been 
defined as hh(D/S)(L/P)FXGXG (6). 

The protein 7 -glutamyl carboxyl methyltransferases 
from Escherichia coli and Salmonella typhimurium have 
the poorest match to the consensus for motif I. They are 
missing an acidic residue at position 3 and they match 
the consensus only at the glycine residues in positions 5 
and 9 and at the proline residue in position 8. However, 
these methyltransferases have well-defined motifs II and 
III (Table II). The bacterial precorrin-3 and uroporphy- 
rinogen methyltransferases, with the exception of CobL, 
also lack the characteristic acidic residue at position 3. 
Instead, a hydrophobic residue is found at this position 
(Table II). 

Motif I is followed in 67 of 69 cases by an aspartate or 
a glutamate residue 17-19 residues C-terminal to the mo- 
tif. The exceptions to this are the rat histamine methyl- 
transferase which has an asparagine at this position and 
the Pseudomonas denitrificans precorrin-3 methyltrans- 
ferase CobM, which has a cysteine at this position. This 
acidic residue is frequently preceded by a number of hy- 
drophobic residues. The consensus is hhXh(D/E), where 
"h" is a hydrophobic residue (data not shown). 

Methyltransferase motif II. As shown in Table II and 
Fig. IB, motif II comprises an eight-residue conserved 
region that is found 57 ± 13 SD (range of 36-90) residues 
after the glycine delineating the end of motif I (Fig. 2A). 
It is present in 46 of the 84 sequences analyzed (26 distinct 
enzymes). The consensus sequence for this region is (P/ 
G)(Q/T)(F/Y/A)DA(I/V/Y)(F/I)(C/V/L) (Table II and 
Fig. IB). The central aspartate is invariant. There are a 
number of positions in close proximity to motif II that 
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Consensus 



Residue 




Consensus 




Residue 



FIG. 1. Relative amino acid distribution in methyl transferase sequence motifs. (A) Motif I. (B) Motif H. (C) Motif III. Individual enzymes 
were assigned a fractional weighting factor inversely proportional to the number of instances that the distinct enzymatic activity occurred in 
different species in Table I. For example, there are three glycine N-methyltransf erases in Table I, so they each received a weighting factor of 
0.333. The rescues, at each position were tabulated in one column and their respective weighting factors were tabulated in the adjacent column 
The residues and the weighting factors were then sorted by amino acid and the weighting factors for each amino acid were then summed up to 
give the frequency of occurrence of each amino acid at each position. The relative frequency at each position was calculated by dividing the amino 



acid frequency by the number of distinct enzymatic activities in Table I. 



are unusually rich in the aromatic amino acids phenyl- 
alanine, tyrosine, and tryptophan. Figure 3 shows that at 
positions -8, -1, and +3 with respect to the central as- 
partate, the relative frequencies of aromatic residues are 
0.31, 0.49, and 0.29, respectively. In contrast, the Dayhoff 
frequency for the three aromatic residues totals only 0.08 
(12). Of 26 distinct enzymes that have region II, only one, 
the TrmD tRNA methyltransferase in E. coli, is devoid 
of aromatic residues in this span. 

M ethyltransferase motif 1 II. As shown in Table II and 
Fig. 1C, motif III comprises a 10-residue conserved region 
that is found 22 ± 5 SD (range of 12-38) residues after 
the end of motif II (Fig. 2B). It is present in 61 of the 84 
sequences examined (28 distinct enzymes). The consensus 



sequence for this region is LL(R/K)PGG(R/I/L)(L/I)(L/ 
F/I/V)(I/L). (Table II and Fig. 1C). The central glycines 
of this region are highly conserved and at least one of 
them is found in all sequences except seven of the RN A 
methyltransferases and the O-demethyl puromycin 
methyltransferase in Streptomyces alboniger. 

Methyltransferase sequence motifs in other AdoMet or 
AdoHcy-utilizing enzymes. We then examined the se- 
quences of other enzymes that utilize the substrate or the 
product of methyltransferases, AdoMet or AdoHcy, to as- 
certain whether they, too, possessed motifs I, II, and III. 
AdoMet decarboxylase (EC 4.1.1.50) in E. coli possesses 
all three sequence motifs in the same relative positions 
as in the methyltransferases. However, the three known 
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FIG- 2. Distance between me thy ltransf erase sequence motifs. The 
number of residues between motifs I and II or between motifs II and III 
for each enzyme was tabulated. A single average value was calculated 
for each of the 37 distinct types of enzymes in Table I. The frequency 
of occurrence at two-residue intervals was calculated and tabulated (e.g., 
"34" = 34-36. etc.). The relative frequency was calculated by dividing 
the frequency for each observation by the sum of the frequencies. (A) 
Distances between motifs I and II in intervals of two residues. (B) Dis- 
tances between motifs II and III in intervals of two residues- 



mammalian AdoMet decarboxylases possess only motif 
III (Table III), while the yeast enzyme does not appear 
to have any of these motifs. We then examined the se- 
quences of the AdoHcy hydrolases (EC 3.3.1.1) for the 
presence of methyltransferase sequence motifs. We found 
that all seven AdoHcy hydrolase sequences possess the 
three motifs (Table III). The interval between motif I and 
motif II is 65 residues in all cases and the interval between 
motif II and motif III is 33 ± 1 SD residues. The first 
interval is comparable to that in methyltransferases (57 
± 13 SD) and the second interval is somewhat larger than 
that found for the methyltransferases (22 ± 5 SD). 

The AdoMet synthetases (EC 2.5.1.6) possess only mo- 
tifs I and III, separated by 107 residues. The consensus 
for motif I in both the AdoMet synthetases and the 
AdoMet hydrolases, however, is GXGDXG. This corre- 
sponds to the ATP-binding motif in protein kinases, 
GXGXXG, that is also found in other nucleotide binding 
proteins such as NAD-binding proteins, ras-like rho pro- 
teins, and in ATP synthase £ and adenylate kinase (18, 



19) In the AdoMet synthetases and the AdoHcy hydro- 
lases, motif III has only one of the two central glycines 
found in this motif in the methyltransferases. However, 
the motif does end in four generally hydrophobic residues 
(Table III). 

Predictive potential of methyltransferase sequence mo- 
tifs. Computerized searches of the protein sequence da- 
tabases may be used to detect sequence similarities be- 
tween proteins of known function and uncharacterized 
ORFs. However, search queries of the database with full- 
length protein sequences may fail to detect database en- 
tries that have only noncontiguous, short homologies to 
the query sequence. A search of the NBRF-PIR release 
36, translated GenBank release 77, and Swissprot release 
25 databases using the consensi we defined for either mo- 
tifs I or II or III and allowing two, one, and two mis- 
matches, respectively, retrieved 424 entries of a total of 
73,582. After elimination of duplicate entries, 353 entries, 
or' 0.48% of the total database, remained. Of these, 37 
were methyltransferases representing 21 distinct enzy- 
matic activities that were described among the 68 enzymes 
in Table I that possess at least one of the motifs and are 
included in the versions of the database indicated above 
(data not shown). An additional three DNA methyltrans- 
ferases that were not included in this study were also 
retrieved. This search also retrieved 30 hypothetical pro- 
teins or ORFs. Of these, 27 were retrieved with motif I, 
five were retrieved with motif III, and two were retrieved 
with both motifs I and III. Visual inspection of these se- 
quences found seven entries that had two or three of the 
methyltransferase motifs in the expected order and spac- 
ing (Table IV). None of the retrieved proteins, other than 
methyltransferases or putative methyltransferases, were 
found to contain more than one of the three sequence 
motifs. Most of the ORFs in Table IV show sequence 
similarities of 20% or greater over 200 or more resi- 
dues to one or more known methyltransferases, further 
supporting the conclusion that these ORFs encode 
methyltransferases. The ORFs JS0718, IN37-SPIOL, 
YYAP-YEAST, and YCPW-PSEA9 show a greater de- 
gree of sequence similarity to a gTOup of related small- 
molecule O-methyltransferases, UbiG, COQ3, and EryG 
than to other methyltransferases (Table IV). While this 
similarity may be suggestive, no sequence blocks that are 
diagnostic of this particular subgroup of O-methyltrans- 
ferases have been identified, so inferences regarding the 
substrate specificity of these ORFs can only be tentative. 

A more stringent consensus search, allowing only a 
single mismatch in each motif, retrieved only 102 entries, 
of which 25 were methyltransferases, representing 13 dis- 
tinct enzymatic activities, three were putative methyl- 
transferases, and 75 were unrelated proteins possessing 
a single consensus motif (data not shown). 

DISCUSSION 

Previous work has shown that many AdoMet and 
AdoHcy utilizing enzymes, including methyltransferases, 
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FIG. 3. Distribution of the aromatic amino acids phenylalanine (F), tyrosine (Y), and. tryptophan (W) in and around me thy Itransf erase motif 
II. The number of aromatic residues at each position 9 residues N- terminal and 10 residues C -terminal to the central aspartate of motif II was 
tabulated. Individual enzymes were assigned a fractional weighting factor as described in Fig. 1. The weighted sum of aromatic residues at each 
position was then divided by 26, the number of distinct enzymatic activities possessing motif II. DayhofT frequency, combined natural abundance 
of F, Y, and W (14). 



AdoMet synthetases, AdoMet decarboxylases, and 
AdoHcy hydrolases, have a common sequence element 
termed here motif I (4, 5, 7-12). In this work, we have 
shown that motif I has the consensus (V/I/L)(L/V)(D/ 
E)(I/V)G(G/C)G(T/P)G and is- present in most meth- 
yltransferases sequenced to date, including 69 non-DNA 
methyltransferases representing 29 distinct enzymes. A 
more limited consensus, hh(D/S)(L/P)FXGXG, was 
identified in DNA cytosine-specific and adenine-specific 
methyltransferases (6). In these enzymes, the phenylal- 
anine and the two glycines are the most conserved. How- 
ever, the other conserved residues may also be found in 
some of these sequences. For example, the PstJ adenine 
methyltransferase has the sequence ILDAGAGVG, which 
matches the more general methyltransferase motif I con- 
sensus hh(D/E)hGXGXG, where "h" is a hydrophobic 
residue (9). The A recent determination of the crystal 
structure of the Hhal DNA m 5 C methyltransferase has 
confirmed the involvement of motif I in AdoMet binding 
(17). Motif I comprises part of the AdoMet binding pocket 
of this enzyme and the conserved phenylalanine, along 
with other hydrophobic residues, forms a hydrophobic 
platform on one side of the purine and ribose rings of 
AdoMet. The glycines in this motif form a tight loop re- 
sponsible for properly positioning the AdoMet in the 
binding pocket (17). 

The sequence heterogeneity of this region has, on oc- 
casion, made this motif difficult to detect. We were able 
to make use of multiple alignments of clusters of related 
methyltransferases to detect the presence of motif I in 
two methyltransferase sequences that had previously been 
reported to lack motif I. These are the midamycin O- 
methyltransferase (mdmC; Ref. 20) and the caffeoyl-CoA 
3-0 methyltransferase (21). This method has the advan- 



tage of being able to align sequences of interest with sim- 
ilar sequences that have well-defined methyltransferase 
motifs. It is often possible to identify motifs by visual 
inspection in such an expanded context. 

Two other regions, termed motifs II and III, were orig- 
inally described in a smaller number of methyltransferases 
(7). Other investigators have not always identified these 
motifs in recently sequenced methyltransferases and 
hence some have questioned the general significance of 
these regions (8-12). However, we have been able to use 
multiple alignments to show that motifs II and III are 
present in both UbiG (9) and in EryG (8), for example. 
We found that motif II is present at a distinct interval of 
57 ± 13 SD C -terminal to motif I and is present in nearly 
as many distinct enzymes as motif I (26 for the former, 
29 for the latter). The main exceptions to this rule are 
the RNA methyltransferases and a number of the por- 
phyrin precursor methyltransferases (Table I). Motif II 
is also found in the AdoHcy hydrolases and in the E. coli 
AdoMet decarboxylase, positioned 65 and 60 residues, re- 
spectively, from the end of motif I (Table III). However, 
in the AdoHcy hydrolases, only positions 4-8 are con- 
served with respect to the sequences found in methyl- 
transferases. 

Motif II is abundant in aromatic residues. At positions 
-8, -1, and +3 with respect to the central aspartate, there 
are respectively 3.9, 6.1, and 3.6 times more phenylalanine, 
tyrosine, and tryptophan residues than expected froni the 
combined DayhofT frequencies for these residues (Fig. 3). 
The possibility that the aromatic residues in, or in close 
proximity to motif II, may be involved in binding AdoMet 
is supported by recent work showing that positively 
charged quaternary ammonium, quinolinium, and sul- 
fonium compounds can be bound by aromatic groups in 
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TABLE III 

Motifs I— III in AdoMet Decarboxylases, AdoHcy Hydrolases, 
AdoMet Synthetases, and NTP Binding Proteins 


motif I 


motif II 


motif III 




L 

IVEV C P 
VLDIGGGTG 


GTY VIV 
PQFDAIFC 


V 

L I 
K IIFL 
LLRPGGRLLI 


Methyl transferase consensus 






DLIPGSVIDAT 


AdoMet decarboxylase majority 


81 VLIIGGGDG 


150 QTFDVIIS 


181 CLNPGGIFVAQ 
211 DLIPGSVIDAT 
211 DLIPGSVIDAT 
211 DLIPGSVIDAT 


AdoMet decarboxylase (E. coli) 
AdoMet decarboxylase (human) 
AdoMet decarboxylase (rat) 
AdoMet decarboxylase (hamster) 




MbDDAIVC a 


LKNGRbhhhh a 


AdoHcy hydrolase majority 


216 AWAGYGDVGK 
216 AWAGYGDVGK 
218 AWAGYGDVGK 
265 ALIAGYGDVGK 
215 CCVCGYGDVGK 
215 AWAGYGDVGK 
265 AWCGYGDVGK 


290 MKDDAIVC 
290 MKDDAIVC 
292 LPNDAIVC 
339 MKNNAIVC 
289 MRDDAIVC 
289 MKEDAIVC 
339 MKNNAIVC 


329 RLKNGRRIILL 
329 LLKNGHRI ILL 
331 TLKNGRHVILL 
380 FPDTGRGIIIL 
328 TMENGRHIILL 
327 TLANGVH I I LL 
380 FPETKTGIIVL 


AdoHcy hydrolase (human) 
AdoHcy hydrolase (rat) 
AdoHcy hydrolase (C. flcgans) 
AdoHcy hydrolase (parsley) 
AdoHcy hydrolase (Leismania) 
AdoHcy hydrolase (mold) 
AdoHcy hydrolase (7. aestivum) 


EalGAGDOG 




LNPSGRFVIG 


AdoMet synthetase majority 


127 EEDIGAGDQGL 

115 PEDIGAGDQGH 

116 PEEIGAGDQGH 
116 LEDLGAGDQGI 
113 PLEQGAGDQGL 




243 HLQPSGRFVIG 

231 HLNPSGRFVIG 

232 HLNPSGRFVIG 
232 FIQPSGRFVIG 
22 5 FINPTGRFVIG 


AdoMet synthetase (human) 
AdoMet synthetase (Arabidopsis) 
AdoMet synthetase (poplar) 
AdoMet synthetase (yeast) 
AdoMet synthetase (E. colt) 



" V, DE; 'h\ MAILV; V, FYW; V, HKR. 



cation-7r interactions that occur between the positively 
charged group and the ir electrons of the aromatic ring 
(22, 23). For example, an artificial host molecule rich in 
aromatic rings can bind acetylcholine with a K d of 50 fiM 
(22, 23). [ 3 H]acetylcholine mustard, a reactive analog of 
acetylcholine, was found to label the Torpedo nicotinic 
acetylcholine receptor at Tyr-93 on the a-subunit in the 
conserved region WXPDhhhYN, where "h" is a hydro- 
phobic residue, at the proposed binding site of acetylcho- 
line (24). Recent crystallographic and affinity labeling 
studies of binding sites of the Torpedo acetylcholinesterase 
for quaternary ligands have demonstrated that the qua- 
ternary groups interact with the indole rings of Trp-84 
and Trp-279 in this enzyme's active site (25). 

Further chemical studies of cation-7r interactions found 
that an aromatic host molecule can catalyze methyl group 
transfer from an aryldimethyl sulfonium compound to 
thiocyanate to form methylthiocyanate (23). It was hy- 
pothesized that catalysis is effected by stabilizing the 
positively charged transition state through cation-7r in- 



teractions. These authors suggest that the positively 
charged sulfonium on AdoMet may be bound and stabi- 
lized by methyltransferases through the same type of in- 
teraction, and predict that the active sites of methyl- 
transferases may be rich in aromatic residues. 

Motif III is found at an interval of 22 ± 5 SD residues 
C-terminal to motif II in 28 distinct enzymes. It is absent 
only in a number of RNA methyltransferases (Tables I 
and II). It is also found in AdoMet decarboxylases, 
AdoMet synthetases, and AdoHcy hydrolases. However, 
usually only one of the two conserved central glycines is 
found in these latter enzymes. Site -directed mutagenesis 
of motif III of the rat guanidinoacetate N-methy trans- 
ferase at five individual positions did not markedly alter 
the K m for AdoMet, or the catalytic activity of this en- 
zyme, although a twofold range of K x values for AdoHcy 
was observed (11). These investigators thus question 
whether this region forms part of the active site (11, 12). 
However, the 10-residue motif III of this enzyme would 
still match the consensus sequence for this motif in 9 of 
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TABLE IV 

Met hyltranf erase Motifs in Hypothetical ORFs in the Sequence Databases 



Database and identifier 3 



Moiifl 



Motif I] 



Motif III 



Al-2 b A2-3 C Database Homologies' 1 Consensus matches 



Melhyltransferase consensus: 



IVEV C P 
VIOIGGGTG 



A Y L 
GTY VIV 
PQFDAIFC 



V 

L IV 
K IIFL 
LLRPGGRLLI 



59 



22 



£ coli : YIGO.ECOLl (SW) 



67 yjJ&LAG£Tfi 132 NTfJJCITI 159 V_U£££RLLy 



57 



£.w/i:JS0718(PIR) e 

S. fradiat : YT3 7_STRFR (SW) 



60 NTfJiiy_IS 87 I LKPGG R LI V 

136 AJLQLfiCfiPfi 192 GSIfiCART 219 VLRPGG R LVH 
YCPW_PSEA9 (SW) f 5 7 IIDAGCGSG 

Yeia:L 12000 (GB) 168 VL£LfiCfiK£ 241 FPC&IYST 272 SLKIGJiHFFG 

Ytia:YYAP_ YEAST (SW) 123 VJJlVfiCfiVfi 186 NTFJJKV.YA 213 VLKPGG T LVM 



Yeast: YCT7_ YEAST (SW) 51 UJUfiCfiSfi 109 GS£fiAAIS 147 LKK££KFV_AQ 

Spbach:lN37_SPIOL(SW) 121 YYJMiGGJG. 181 DYAJJRYYS 208 VJJtL££ICACi 



65 
55 



50 
52 



20 



20 
20 



24 
20 



31 
20 



PIMT (human): (24.2%, 145 aa); 

PrmA; (23.9%, 200 *a) 

EryG(2I.7%, 174 aa) 

PIMT (bovine): (24.5%, 183 aa) 
COQ3 (yeast): (30.8%. 86 aa) 
PNMT (rat): (163%, 256 u) 
pnUA: (25.4*. 131 aa); 
EryG: (24 3%, 233 aa): 
UbiG: (23.6%, 178 aa) 
GNMT (rabbit): (24.6%, 232 aa) 
UbiG: (26.1%. 129 aa); 
EryG: (21-2%, 293aa) 



i.m 
m 

I, ID 

I 
1 

I 



a PIR, NBRF-PIR release 36; GB, Genbank release 77; SW, Swissprot release 25. 
b Interval between motifs I and II. 

* Interval between motifs II and III. 

d Highest similarity scores in a Lipman-Pearson alignment of the ORFs to the methyl transferases in Table I or to other ORFs in this table 
The alignment parameters were ktup = 2, gap penalty = 4 (or % for YCPW_PSEA9 vs COQ3), and gap length penalty = 6 

* Partial C-terminal sequence. 
' Partial N-terminal sequence. 



10 positions, which is also the case in a variety of other 
methyltransf erases (Table II). 

There are three lines of evidence that suggest that the 
three motifs are involved in binding AdoMet or AdoHcy. 
First, the crystal structure of the Hhal m 6 C DNA meth- 
yltransferase complexed with AdoMet discussed above has 
directly demonstrated that motif I is involved in AdoMet 
binding. Second, they are present in a large number of 
methyltransferases with diverse functions. As AdoMet 
and AdoHcy binding is the common feature of these di- 
verse enzymes, the presence of three distinctly spaced 
sequence motifs suggests evolutionary conservation of 
protein regions involved in this activity. Third, an ex- 
periment aimed at directly characterizing the interaction 
of AdoMet with methyltransferases has found evidence 
for the interaction of residues in the vicinity of motif II 
with AdoMet. Tyr-136, located three residues C-terminal 
to motif II of the rat guanidinoacetate N-methyltrans- 
ferase (EC 2.1.1.2), is exclusively photolabeled with S- 
adenosyl[ 3 H-met/iy/]methionine (26). In addition, the 
competitive inhibitors AdoHcy and sinefungin were able 
to block effectively the photoincorporation of radioactiv- 



ity. This finding is consistent with the involvement of 
aromatic residues in this region in AdoMet binding dis- 
cussed earlier. 

There are, however, a number of methyltransferases that 
apparently do not possess any of these three motifs or ad- 
ditional elements of sequence similarity to other methyl- 
transferases. These enzymes include the isoprenylcysteine 
protein carboxyl methyltransferase, the diphthamide bio- 
synthesis N-methyltransferase, the phospholipid methyl- 
transferases PEMl and PEM2, and the bacterial rRNA 
methyltransferases Sgm and Grm (see Table I). These re- 
sults suggest convergent evolution of AdoMet-dependent 
methylation. With the possible exception of the STE14 iso- 
prenyl cysteine protein carboxyl methyltransferase and the 
PEM2 phospholipid methyltransferase, there are no obvious 
sequence similarities among the methyltransferases that 
lack the three sequence motifs discussed here nor between 
these methyltransferases and other proteins in the database. 
It is possible, therefore, that many of these enzymes may 
represent independent approaches to implementing 
AdoMet-dependent methylation. Likewise, crystallographic 
studies of the AdoMet-binding MetJ repressor in E. coli 
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show an independent approach to AdoMet binding- In this 
protein, AdoMet is bound by sequence segments unrelated 
to the motifs discussed in this paper and is found at the 
interface between the protein and the MeU-binding DNA 
sequence (27). 

As the number of unidentified ORFs in the databases 
continues to increase, the use of these methyltransferase 
sequence motifs coupled with a knowledge of their ex- 
pected spacing may be useful in discerning candidate 
methyltransferases among these new sequences. Conven- 
tional sequence homology searches often fail to detect 
relationships between distantly related proteins because 
the overall sequence similarity is low and similarities may 
only be found in noncontiguous, short regions of local 
similarity. A query of the combined NBRF PIR release 
36, translated GenBank release 77, and Swiss Prot release 
25 databases using the consensus motifs defined here was 
able to pick out 37 methyltransferases that also appear 
in Table I. As only one of these (cafTeoyl CoA O-MT) was 
identified exclusively with motif II, it appears that motifs 
I and III are the most useful for database searching. Al- 
though the consensus search allowing for two mismatches 
in motifs I and III and one mismatch in motif II also 
yielded approximately 300 false positives, these can be 
readily distinguished from methyltransferase candidates 
as only a single motif is found upon further visual in- 
spection. Because there is significant heterogeneity in the 
methyltransferase sequence motifs the consensi that we 
have defined did not detect some of the methyltransferases 
listed in Table I. While it is possible to increase the num- 
ber of hits by less stringently defining the consensi, this 
also increases the number of false positives. A more fruit- 
ful approach may be to employ less stringent consensus 
sequences and a search algorithm that is able to screen 
for the ordering and the spacing between the motifs. Work 
in this direction is currently underway in our laboratory. 

Our knowledge of methyltransferase sequences is still 
fragmentary. There are 103 distinct AdoMet-dependent 
methyltransferases that have been assigned EC numbers 
(1) and at least another 11 distinct enzymes listed in Table 
I that have not yet been assigned EC numbers. For ex- 
ample, while several sequences of protein carboxyl meth- 
yltransferases are known (see Table I), no sequences are 
yet available for protein arginine, histidine, lysine, or N- 
terminal a- amino methyltransferases (28). The 37 distinct 
methyltransferases whose sequences were analyzed in this 
study thus comprise less than one-third of these enzymes. 
Sequence determinations of more of these enzymes, as 
well as of yet-to-be-discovered methyltransferases, may 
be of use in discerning additional blocks of sequence sim- 
ilarity between enzymes of related function to understand 
better the evolutionary relationships between the various 
methyltransferases. 
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Crystal structure of the chemotaxis receptor methyitransferase 
CheR suggests a conserved structural motif for binding 
S-adenosylmethionine 

Snezana Djordjevic and Ann M Stock* 



Background: Flagellated bacteria swim towards favorable chemicals and away 
from deleterious ones. The sensing of chemoeffector gradients involves 
chemotaxis receptors, transmembrane proteins that detect stimuli through their 
periplasmic domains and transduce signals via their cytoplasmic domains to the 
downstream signaling components. Signaling outputs from chemotaxis receptors 
are influenced both by the binding of the chemoeffector ligand to the periplasmic 
domain and by methylation of specific glutamate residues on the cytoplasmic 
domain of the receptor. Methylation is catalyzed by CheR t an 
S-adenosylmethionine-dependent methyitransferase. CheR forms a tight complex 
with the receptor by binding a region of the receptors that is distinct from the . 
methylation site. CheR belongs to a broad class of enzymes involved in the 
methylation of a variety of substrates. Until now r no structure from the class of 
protein methyltransferases has been characterized. 

Results: The structure of the Salmonella typhimurium chemotaxis receptor 
methyitransferase CheR bound to S-adenosylhomocysteine, a product and 
inhibitor of the methylation reaction, has been determined at 2.0 A resolution. The 
structure reveals CheR to be a two-domain protein, with a smaller N-terminal 
helical domain linked through a single polypeptide connection to a larger 
C-terminal a/p domain. The C-terminal domain has the characteristics of a 
nucleotide-binding fold, with an insertion of a small antiparallel p sheet 
subdomain. The S-adenosylhomocysteine-binding site is formed mainly by the 
large domain, with contributions from residues within the N-terminal domain and 
the linker region. 
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Conclusions: The CheR structure shares some structural similarities with small 
molecule DNA and RNA methyltransferases, despite a lack of sequence similarity 
among them. In particular, there is significant structural preservation of the 
S-adenosylmethionine-binding clefts; the specific length and conformation of a 
loop in the a/p domain seems to be required for S-adenosylmethionine binding 
within these enzymes. Unique structural features of CheR, such as the 
P subdomain, are probably necessary for CheR's specific interaction with its 
substrates, the bacterial chemotaxis receptors. 



Introduction 

The methyltransferases are a large and diverse class of 
enzymes that catalyze the transfer of methyl groups from 
■S*-adenosy (methionine (AdoMet) to a wide range of sub- 
strates, including small molecules, nucleic acids and pro- 
teins. Despite the use of a common cofactor, AdoMet, the 
mechanism of methyl transfer is not conserved among 
methyltransferases. This may be reflected in the lack of dis- 
tinguishing sequence motifs that define the active sites of 
methyltransferases, comparable to the consensus sequences 
characteristic of the nucleotide-binding sites of kinases and 
dehydrogenases. And although AdoMet is the second most 
commonly used cofactor after ATP, there are relatively few 
structural descriptions of AdoMet-binding sites. 



The three-dimensional structures of several DNA methyl- 
transferases [1-3], an RNA methyitransferase [4] and 
two small molecule methyltransferases [5,6] have been 
determined, but structural information regarding protein 
methyltransferases is lacking. Protein methyltransferases 
are diverse both with respect to the target amino acid 
modified by methylation and the proposed role of the 
modification [7]. Some protein methylations are irre- 
versible and are assumed to have a structural role, such 
as the methylation of a-amino groups of a variety of N-ter- 
minal amino acids or the .V-methylations of histidine, 
arginine and lysine sidechains. Methylation at protein car- 
boxy I groups, however, is reversible and appears to func- 
tion more dynamically. Methylation of aspartyl sidechains 
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is proposed to be involved in repair of damaged pro- 
teins [8*9], and methylacions of the C-terminal a-carboxyl 
groups of eukaryotic proteins [10,11] and of the glutamyl 
sidechain carboxyl groups in bacterial receptors (12,13] are 
involved in signal transduction. 

Bacterial chemotaxis transmembrane receptors are reversibly 
modified by methylation of four to five glntamate residues 
within their cytoplasmic domains (reviewed in [14,15]). 
The methylatior>-demethylation reactions are catalyzed 
by CheR, an AdoMet-dependent protein methyl trans- 
ferase, and CheB, a methylesterase/amidase. Methylation of 
the chemorecepiors counterbalances the effects of ligand 
binding, and contributes to the phenomenon of adaptation 
by resetting the signaling activity of the receptors despite 
the continued presence of stimulus. Methylation of the 
receptors is highly regulated by multiple mechanisms. The 
regulation of enzyme activity occurs primarily through 
control of the methylesterase CheB, a response regulator 
protein that is activated by a two-component phosphotrans- 
fer pathway [16,17]. Additionally, both methylating and 
demethylating reactions are influenced by the specific con- 
formation of the receptors which presumably affects acces- 
sibility of the glutamate residues [18-20]. Each specific 
glutamate residue is methylated at a characteristic rate, 
which correlates with the magnitude of the effect on chemo- 
taxis of mutation of the specific residue [21,22]. The methyl- 
transferase CheR binds to a five amino acid tail at the 
C termini of some chemoreceptors; this tail is distant in 
primary sequence from CheR's sites of methylation [23]. 
Data suggests that from this tethered position, the methyl- 
transferase methylates other chemoreceptor dimers through 
inter-dimer interactions. 

The structure of the Salmonella typhhnurium chemotaxis 
receptor methyltransferase CheR bound to ^-adeno- 
syl homocysteine *(AdoHcy), a product and inhibitor of 
the methylation reaction, has been determined at 2.0A 
resolution. This is the first report of the structure of 
a protein methyltransferase. Unlike catechoI-0-methyl- 
transferase and cytosine-DNA methykransferases, the 
active site of CheR involves neither metal ions nor a 
methylcysteinyl intermediate in catalysis. In this respect, 
CheR is similar to the adenine-DNA methyltransferase 
and RNA methyltransferase vaccinia protein VP39. 
Although there are specific differences in topologies, pre- 
sumably due to the nature of substrates for each enzyme, 
the structure of CheR confirms that methyltransferases 
from all four substrate groups (proteins, DNA, RNA and 
small molecules) share some structural features. Struc- 
tural analysis of the four classes of enzymes indicates 
that the AdoMet-binding site is characrerizxd by the spe- 
cific conformation of a pl/aA loop within the a/p domain, 
with additional interactions contributed by residues of a 
linker to an additional domain that is commonly involved 
in substrate recognition. These features are specific to 



methyltransferases and distinct from cofactor-binding 
sites of other nucleotide-binding proteins. In addition to 
the interest in CheR as an AdoMet-dependent protein 
methyltransferase, its structure provides a foundation for 
beginning to explore the complex interactions between a 
chemotaxis receptor modification enzyme and its multi- 
ple substrates. 

Results and discussion 

Structure determination 

The structure of S. typhimurium CheR in a complex with 
AdoHcy was determined at 2. OA resolution by multiple 
isomorphous replacement (MIR). The crystals belong to 
the monoclinic space group P2,. Diffraction data for native 
and several heavy-atom derivatized crystals were obtained 
as summarized in Table 1. A native data set for a crystal 
equilibrated at pH7.0 was also collected. These data 
merged well with data from the original native crystal 
grown at pH5.6 (R mcrKC = 4.9%), thus a more suitable 
neutral pH was chosen for preparation of the derivatives. 
An electron-density map that was calculated with density 
modified MIR phases is shown in Figure 1. 

The atomic model of CheR was refined by using a combi- 
nation of X-PLOR and ARP procedures (see Materials 
and methods for details). The final model has good geom- 
etry with only one residue, Serl25, outside of the allowed 
regions of a Ramachandran plot. Serl25 is located in the 
core of the molecule and is involved in formation of the 
cofactor-binding site. The unusual backbone conforma- 
tion of this residue is perhaps unremarkable, because 
active-site residues commonly acquire unusual phi and 
psi angles. 

The final model contains 2224 non-hydrogen protein 
atoms corresponding to residues 1 1 to 284, 26 non-hydro- 
gen atoms belonging to AdoHcy and 110 solvent mol- 
ecules. The crystaJIographic R factor for this model is 
19.6% for 18035 reflections between 8.0 and 2.<)A resolu- 
tion, and the free R factor, calculated with 5% of the data, 
is 27.8%. The root mean squared (rms) deviations from 
ideal geometry are 0,01 4 A for bond lengths and 1.7* for 
bond angles. A summary of the overall quality of the 
model is presented in Table 2. 

The N-terminal ten residues of CheR are not visible in 
electron-density maps. An N-termina) sequence analysis 
of the CheR protein, obtained by dissolving the native 
crystals, indicated that approximately 90% of the protein 
molecules started at residue Thr2 (indicating removal of 
the N-terminal Met), with a minor portion of the protein 
molecules starting at Gin 15. These data are in accordance 
with an apparent disorder of the N-tcrminal end of the 
polypeptide chain. Additionally, four amino acids at the 
C terminus of CheR, residues 285-288, are not observed 
in electron-density maps. 
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Table 1 



Data collection and MIR phasing. 



Compound 


Concentration 
(mM) 


Soaking 
time (days) 


Res* 
(A) 


Com.* 
(%) 


R mer * 
(%f 


R factor§ 
(%) 


No. of 
sites 


Phasing* 
power 


(%) 


native pH5.6 






2.0 


90 


5.0 










native pH7.0 






2.0 


93 


4.9 










C 2 H 5 C1H 9 


1 


3 


2.7 


76 


6.0 


23.1 


2 


2.8 


51 


C 2 H 5 ClHgII 


1 


3 


2.7 


85 


4.3 


24.2 


2 


2.8 


51 


Baker's 


1 


3 


2.7 


87 


5.3 


23.3 


3 


1.4 


73 


Baker's 1 1 


1 


3 


2.7 


89 


4.8 


34.3 


2 


1.4 


81 


DMA 


1 


3 


3.0 


78 


7.6 


42.3 


3 


1.5 


72 


K 2 RC1 4 


0.5 


3hrs 


3.0 


85 


5.7 


34.7 


3 


1.2 


83 



♦Resolution limit of phasing. 'Completeness of data set. *R mefge =H|! hi -<I h >|/2Z<I h >. §R factor=£||I PH |-|l p ||/Zj! p |. *Rms amplitude of the 
heavy atom F / residual lack of closure. * R culli3 = X | (F^ - F p | - [ F H 1 1 / 1 1 F H j. 



Protein architecture 

CheR is a two-domain mixed a/p protein. The a-carbon 
tracing, ribbon diagram and CPK model of CheR are shown 
in Figure 2. A topology diagram of CheR is shown in 
Figure 3. The smaller, N-terminal domain of CheR con- 
tains residues 1 1-90. This domain consists of four perpen- 
dicularly packed helices (al-a4) and an extended terminus 
formed by residues 11-20, which are oriented away from 
the domain without interactions with the rest of the mol- 
ecule. Jn the crystal, residues 11-20 are packed between 
two crystallographic symmetry-related molecules, forming 
few hydrogen bonds and exhibiting high temperature 
factors (-40 A 2 ). It is very likely that the conformation of 
this region is influenced by crystal packing. It is conceiv- 
able that the N-terminal twenty residues acquire a different 

Figure 1 

Representative portion of an experimental 
electron -density map. A stereo image of a 
region of the final CheR model (black line) is 
shown superimposed with an MIR/DM 
electron-density map calculated to 2.7 A and 
contoured at 1 o. The figure shows the 
electron density associated with residues 
129-153. 



conformation, when interacting with the chemotaxis recep- 
tors or in the vicinity of the cytoplasmic membrane. Helix 
a4 of the N-terminal domain is connected to helix a5 of the 
C-terminal domain through an extended linker sequence 
(residues 91-98). A relatively small (-440 A 2 ) interface 
between the two domains contains eight hydrogen-bond 
interactions at the outer rim of the interface, with five 
hydrophobic residues grouped at the center. Association 
between the small N-terminal domain, the linker region 
and the large C-terminal domain is apparent in the CPK 
representation of the model (Fig. 2c). 

The C-terminal domain is composed of residues 99-284. 
The core of this domain consists of a mixed seven-stranded 
3 sheet that is flanked on both sides by a helices. The 
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Table 2 


Refinement statistics. 


Number of atoms 


2360 


Resolution (A) 


2.0 


Number of reflections 


18035 


Number of water molecules 


110 


R factor (%) 


19.6 




27.8 


Rms bond length (A) 


0.014 


Rms bond angle (°) 


1.7 


Average thermal factors 




Mainchain (A 2 ) 


21.0 


Sidechain (A 2 ) 


25.1 



*Rf ree was calculated from 5% of reflection data. 



overall topology of the C-terminal ot/p domain is common to 
other previously characterized methyltransferases (Fig. 3). 



For consistency, in the topological comparison, p strands 
and ot helices are labeled according to the nomenclature 
previously used for the description of methyltransferases 
[24,25). The basic topology represents a variation of a Ross- 
ma nn- type ot/p fold. This was also confirmed by the results 
of a structural similarity search carried out using DALI |2o|. 
which in addition to methyltransferases. identified fortv 
other structures with similarity greater than Z = 3.1cr (Z is 
the standard deviations above the mean). These molecules 
were mostly NAD- and FAD-dependent dehydrogenases 
and reductases. The methylesterase domain of CheB [27] 
also shares a structural similarity to CheR (Z=3.3cr), even 
though this enzyme does not utilize a nucleotide cofactor. 
The structural similarity between CheR and CheB might 
reflect a common evolutionary origin or a requirement for a 
common scaffold that enables them to react with the same 
substrates, the chemotaxis receptors. 



Figure 2 




The three-dimensional fold of CheR. (a) A 
stereo image of the Co chain of CheR with 
residue numbers indicated, (b) A ribbon 
diagram (RIBBONS; (56J) of CheR, showing 
q helices in green, p strands in yellow and 3 10 
helical turns in blue. The AdoHcy molecule 
bound to CheR is shown in solid spheres. 
Colors are gray for carbon, blue for nitrogen, 
red for oxygen and yellow for sulfur atoms, 
(c) A CPK model of CheR, including all 
atoms. Green spheres indicate residues of the 
a/0 domain and gold spheres indicate 
residues of the N-terminal helical domain and 
the linker region. Atoms of the AdoHcy 
molecule are colored the same as in (b). From 
this orientation, only gray carbon atoms of 
AdoHcy are visible. In these figures, the 
orientations of the model are approximately 
the same. Figures (a) and (c) were prepared 
using TURBO. 
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Figure 3 




Hhal and Haelll Taql 




COMT VP39 




Schematic topology diagram of CheR, cat echo)- O-methy transferase 
(COMT), Haelll, Hha\ and Taq\ DNA methyitransferases, RNA 
methyltransferase VP39 and the catalytic domain of CheB. Helices and 
strands that are approximately perpendicular to the plane of the figure 
are represented by circles and triangles, respectively, and the helices 
that are close to being parallel to the plane of the figure are shown as 
rods. Dashed lines denote the positions of additional domains. Helical 
turns that are less than four residues long are not shown as helices in 
the figure. 



A novel feature of CheR's C-terminal domain is the pres- 
ence of a small subdomain composed of an a helix and a 
short antiparallel three-stranded p sheet. This subdomain 
(residues 166-199) is inserted after aB, at one edge of the 
P sheet. The subdomain stands almost independently, 
away from the structure, with only one region involved in 
interactions with the rest of the ot/p domain. Approxi- 
mately 420 A 2 of the molecular surface of the a/p domain 
is buried in the subdomain. At the interface, six hydro- 
phobic residues from the subdomain contribute to the 
hydrophobic interactions; two of these are tyrosines that 
also form two out of a total of four hydrogen bonds. 

The binding mode of the adenine portion of AdoMet 
resembles that of NAD in NAD-dependent enzymes, 
such as alcohol dehydrogenase [29]. The adenine ring 
is positioned within a hydrophobic pocket sandwiched 



between two relatively large aliphatic residues (lie 155 and 
Val232 in the case of CheR), while an acidic residue 
(Asp 154) forms a hydrogen bond with the hydroxyl groups 
of the ribose ring. In addition, an amino group from the 
adenine ring forms a hydrogen bond with the backbone 
carbonyl oxygen of Ala38\ within the N-terminal helical 
domain, and also with the sidechain carbonyl oxygen of 
Asn212. The homocysteine portion of AdoHcy extends 
away from the ribose, fitting into an elongated groove in 
the CheR molecule formed by residues from both the 
C-terminal domain and the linker region. The oxygen 
atoms from the carboxyl and amino groups of the homo- 
cysteine form hydrogen bonds either with sidechain 
atoms pointing towards the binding cleft or with the sur- 
rounding backbone atoms of CheR. All of the residues 
that are engaged in hydrogen bonds to the homocysteine 
are completely conserved among the QheR proteins from 
a wide variety of bacterial species. The residues forming 
the hydrophobic environment around the adenine ring, 
although not completely conserved, are highly similar in 
all of the CheR sequences. 

Because of the involvement of cysteine residues in the 
catalysis of methyl transfer by cytosine-DNA methyl- 
transfe rases, the two cysteine residues in S. typhimurium 
CheR, Cys31 and Cys229, have been the focus of a 
number of mutagenesis and biochemical studies [30]. 
Substitution of Cys31 with serine resulted in an 80% 
decrease in methyltransferase activity. Furthermore, it 
was shown that inactivation of wild-type or Cys229-» 
Ser-mutant CheR by sulfhydryl reagents could be pre- 
vented by preincubation of the enzymes with AdoMet. In 
another report, Cys31 was photolabeled with J-adenosyl- 
L-[methyl- 3 H]methionine, suggesting that Cys31 was 
located at, or near, the AdoMet-binding site [31]. 

Surprisingly, neither Cys31 nor Cys229 are part of the 
active site. In the three-dimensional structure of CheR, 
Cys31 resides within the N-terminal domain, -15A away 
from the AdoHcy-binding site. Thus, it appears that the 
observed decrease in activity of the Cys31— »Ser mutant 
is a consequence of some effect other than direct partici- 
pation of this residue in the catalytic reaction. We were 
unable to obtain crystals of CheR in the absence of 
cofactor, which may indicate a somewhat flexible nature 
of the molecule, specifically with respect to the inter- 
domain connection. In fact, due to the buried nature 
of the AdoHcy-binding site (Fig. 2c), significant move- 
ment of the domains must occur to provide the cofactor 
access to the binding cleft. Given that the N-terminal 
domain and linker region also contribute to AdoHcy 
binding, binding of cofactor may lock Cys31 into a less 
solvent accessible position and thus protect it from 
sulfhydryl reagents. In reverse, the Cys31Ser mutation 
may disrupt inter-domain interactions and thus affect 
AdoMet binding. 



i 
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Figure 4 



Arg 98 



Arg 230 




Arg 98 



He 155 



Ala 38 




lie 155 



Ala 38 



Asn212 



Asn212 




Asp 154 



Asn 212 



S-adenosylhomocysteine-binding site in CheR. (a) A stereo diagram 
(MOLSCRIPT; (57]) of the AdoHcy- bin ding site. Only sidechain atoms 
are included in the figure except for the residues that form hydrogen 
bonds with AdoHcy through mainchain atoms. Hydrogen bonds are 
represented by dashed lines, (b) A schematic view of the contacts 
identified in the crystal structure of the CheR- AdoHcy complex. 



Hydrogen bonds are drawn with dashed lines and covalent bonds are 
shown as solid lines connecting the solid spheres that denote atoms. 
Residues within the hydrophobic pocket that accommodates the 
adenine portion of AdoHcy are represented by parallel curved lines. 
Sulfur atoms are shown in green, other atom colors are the same as 
Figure 2. 



Comparison of cofactor-binding sites in CheR and other 
methyltransferases 

Overall sequence. similarity among methyltransferases that 
methylate different types of substrates is very weak. Analy- 
sis of amino acid sequences of a broad range of AdoMet- 
dependent methyltransferases revealed only three short 
regions with sequence similarity 132J. It was suggested that 
the amino acids in these regions might have a common 
function, such as the binding of AdoMet/ AdoHcy, or alter- 
natively might be indicators of a common evolutionary 
origin. More recently, a structure-guided analysis revealed 
nine conserved sequence motifs specific to DNA amino- 
methyltransferases [24]. Even though the sequential order 
of these motifs varies in DNA amino-methyitransferases. 
their sequence similarity is significant. On the basis of 
structural and sequence comparisons, it has been proposed 
that many, and possibly all, AdoMet-dependent methyl- 
transferases have a common catalytic-domain structure 
[33.25]. Recent publications have examined the primary 
sequence composition of the AdoMet/AdoHcy-binding site 
[4] and have addressed its relationship with the NADbind- 
ing site of alcohol dehydrogenase [34]. Crystal structures 
are now available for members of all four major classes of 
methyltransferases: three DNA methyltransferases — //////! 



[35], Taq\ [2] and Hae\\\ [3]; an RNA methyltransferase — 
VP39[4J; two small molecule methyltransferases — catche- 
chol-O-methyltransferase [5] and glycine methyltransferase 
[6]; and a protein methyltransferase — CheR. In addition, 
many more amino acid sequences for a variety of methyl- 
transferases are also available, which enable us to address 
the questions of cofactor binding, evolutionary origin and 
possible identification of characteristic primary or tertiary 
structure elements. 

We have superimposed all available methyltransferase 
structures and have examined the AdoMet/AdoHcy- 
binding sites in order to carry out a comprehensive and 
detailed analysis. Table 3 summarizes the structural com- 
parisons of the cofactor-binding sites by listing the residues 
within a 4 A distance to the cofactor molecule. We included 
only the DNA-bound structure of Hha\ cytosine methyl- 
transferase and not the DNA-frec form. In the latter 
structure, the AdoMet molecule exhibited an inverted ori- 
entation within the binding cleft [1], which the authors 
subsequently suggested was probably a non-physiological 
phenomenon related to the crystal packing [35]. Addi- 
tionally, glycine methyltransferase was excluded from our 
analysis even though it exhibits a great deal of structural 
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Table 3 



Methyltransf erase residues involved in interactions with AdoMet/AdoHcy. 



Binding-site feature 


CheR 


COMT 


HhaV 


Haelll* 




VP39* 


Hydrogen bonds 














with AdoHcy atoms 11 














N6;Ade 


0;Asn212 


O; Gin 120 


O;Asp60 


O;Asp50 


0;Asp89 


0;Val116,m 


N6;Ade 


0;Ala3B,m 


0;Ser119 


- 


- 


- 


- 


N; HomoCys 


0;Ala123,m 


0;Gly66,m 


- 


- 


- 


0;Gly68 t m 


N;HomoCys 


0;Glu129 


0;Asp141 


— 


- 


0;Glu45 


O; Asp 138 


N; HomoCys 


O;Arg230,m 


0;Ser72 


- 


- 


- 


0;Tyr66 


Ol; HomoCys 






N;Gly23,m 


N:Gly12 r m 






01 ; HomoCys 


- 


- 


O;Ser305 




0;G!u22 




02; HomoCys 


0;Thr94 


Q;Ser72 


O;Ser305 




0;Thr23 


N;Gln39 


02; HomoCys 


N;Arg98 


N;Val42 t m 


N;Leu21,m 


N;Ala10,m 




N;G!y72,m 


02;Ribose 


Ol;Asp154 


01;GIu90 


Ol;Asp40 


Ol;Glu29 


01;Glu71 


Ol;Asp95 


03 ; Ribose 


02;Asp154 


O2;Gtu90 


Ol;Asp40 


Ol;Glu29 


02;Glu71 


02;Asp95 


Hydrophobic interactions 












with adenine ring 
















He 155 


Met91 


Phe18 


Phe7 


Me72 


V Ile67 




Leu213 


His142 


Trp41 


Tyr30 


Phe90 


Phe115 




Val232 


Trp143 


Ile61 


I!e51 


Pro107 


Val1 16 




Phe236 




Pro80 


Pro70 


Phe146 


Vai139 


Residues in (J1 fa A loop n 


AAASTGE 


LGAYCGY 


LFAGLGG 


LFSGAGG 


PACAHGP 


IGSAPGT 



* Catechol- O-methyltransferase.tH/ial cytosine methyitransferase. 
*Hae\\ \ cytosine methyitransferase. § 7ao] adenine methyitransferase. 
# VP39 vaccinia protein RNA methyitransferase. * Format: methyi- 
transferase atom; residue with m indicating a mainchain atom. n Bold 

similarity with other mcthyltransferases. AdoMct binds 
very differently in this enzyme, primarily due to the multi- 
mcric nature of glycine methyitransferase and the presence 
of an additional domain [6], Glycine methyitransferase also 
binds tetrahydrofolate and polycyclic aromatic hydrocarbon 
molecules, indicating a much less AdoMet-specific binding 
character. The active site of this enzyme presumably rep- 
resents an entirely different class of eofactor-binding sites. 

The only residues common to all AdoMet-binding sites are 
an aspartate or a glutamate, which forms hydrogen bonds to 
the hydroxyl groups of the ribose and, with the exception 
of VP39, an aspartate, glutamate or asparagine, which forms 
hydrogen bonds to the amino group of the adenine. These 
residues, however, are also found in alcohol dehydrogenase 
and other enzymes that bind NAD, a cofactor with adenine 
and ribose moieties identical to AdoMet/ AdoHcy. Inter- 
estingly, in RNA methyitransferase VP39, there is also an 
aspartate residue in a corresponding position. This aspar- 
tate, however, points away from the AdoMet, and instead 
the adenine amino group forms a hydrogen bond with the 
carbonyl oxygen of the neighboring residue. Similarly 
to the adenine of NAD in NAD-dependent enzymes, 
the adenine ring of AdoMet/AdoHcy in all methyltrans- 
ferases is situated within a hydrophobic pocket of variable 
residues. There is also variability in the methyitransferase 
residues that participate in hydrogen bonds with the car- 
boxyl and amino groups of the methionine portion of 
AdoMet/AdoHcy. 



residues belong to the loop, the first and the last residues flanking the 
loop correspond to the end of p 1 and the beginning of uA, 
respectively; this region corresponds to the FGLGG sequence in 
alcohol dehydrogenase. 

Based on examination of the structures, it appears that the 
size and shape of the AdoMet-binding clefts are very 
similar in all mcthyltransferases, despite the apparent lack 
of sequence identity. Importantly, in all of the methyl- 
transferases, the binding cleft is formed mainly by residues 
coming from the a/p domain, with some additional contri- 
butions from residues within the linker region that con- 
nects the a/p domain to an additional domain, which is 
commonly involved in determining substrate specificity. 
The loop connecting the first P strand in the ot/p domain 
to the following a helix is well characterized in NAD- 
binding enzymes as it contains the conserved sequence 
motif Gly-X-Gly-Gly, in which the last glycine is the first 
residue of the a helix. In mcthyltransferases, the sequence 
of this loop is much less conserved. The loop is two 
residues longer compared to that of alcohol dehydroge- 
nase, such that it greatly affects the position of the con- 
necting a helix, and it directs the shape of the bed of the 
AdoMet-binding cleft over which the methionine portion 
of the cofactor is positioned. The sequences involved in 
forming this loop are listed in Table 3. A common feature 
of all mcthyltransferases is that the loop ends with a 
glycine residue and in all but RNA methyitransferase 
VP39, the third residue within the loop displays specific 
phi/psi values (approximately phi = 50 and psi =—1 30) 
falling into a disallowed region of the Ramachandran plot, 
regardless of the amino acid type. The unusual phi/psi 
values are the consequence of these residues being part of 
a hairpin type II structure. In VP39, the fourth residue in 



552 Structure 1 997, Vol 5 No 4 



the loop is cis proline; this changes the hydrogen-bonding 
scheme of the loop and introduces a small displacement in 
the aA helix, not observed in the other methyltransferases. 
However, the overall shape of the pl/aA loop in VP39 is 
still highly similar to all other methyltransferase structures. 

Despite the highly similar topologies of methyltransferases 
and NAD-binding enzymes, the pI/orA loops, which are of 
great importance for both classes of enzymes, exhibit very 
different conformations. We have examined this region in 
ail available structures containing nucleotides in Rossmann- 
type a/p folds. The majority of structures contain a pl/aA 
loop that is three residues long, similar to alcohol dehydro- 
genase; a smaller number of structures contain a loop that 
is four residues long, similar to that found in cholesterol 
dehydrogenases. Within each of the groups represented by 
alcohol dehydrogenase and cholesterol dehydrogenases, 
the backbone atoms of the 16 residues that form the pi- 
ioop-aA region overlay very closely with rms deviations 
ranging from 0.33 A to 0.86 A and 0.67 A to 1.03 A, respec- 
tively. In the structure of enoyl acyl carrier protein reduc- 
tase, although the loop is six residues long, the p strand and 
a helix overlay well with the other enzymes (rms deviation 
0.67 A). Structural alignment of the pl-loop-aA region of 
CheR (residues 118-134) with corresponding residues of 
the other methyltransferase structures gives backbone rms 
deviation values from 0.69 A for CheR and catechol methyl- 
transferase to 1.06 A for CheR and Tag} methyltransferase. 
Figure 5 shows the a-carbon models of these loops from the 
methyltransferases and the loop region of the alcohol dehy- 
drogenase structure, aligned by superimposing the pi 
strand and the first two loop residues of the corresponding 
structure of CheR. The larger loop found in the methyl- 
transferases creates a proper space for binding of AdoMet 
and, within the cleft, the a-amino and a-carboxyl groups of 
methionine form hydrogen bonds with whatever protein 
atoms are available within the appropriate distances. The 
loop itself, in different methyltransferases, not only dictates 
the shape of the binding cleft, but actually provides the 
specific backbone atoms involved in hydrogen-bond forma- 
tion with the cofactor or it specifically positions the 
sidechains at the end of pi or at the beginning of aA for 
formation of hydrogen bonds (Table 3). Even though there 
is sequence conservation within the DNA methyltrans- 
ferases, the comparison of diverse and unrelated methyl- 
transferases reveals that it is not exclusively the specific 
sequence, but rather the specific length and conformation 
of the pl/aA loop that allows for AdoMet to bind. In most 
of the methyltransferases, the presence of a second domain 
also contributes to formation of the binding cleft and the 
overall binding energy of AdoMet. 

It is now possible to examine the previously identified 
regions of primary sequence similarity [32], with respect to 
the three-dimensional structures of the methyltransferases. 
Two sequences, designated consensus regions 11 and 111, are 



located at the beginnings of p strands four and five, respec- 
tive^ and are distant from the AdoMet-binding site. Exam- 
inations of a larger number of methyltransferase sequences 
now suggests that these regions are much less conserved 
than indicated by the original alignment. The residues that 
are fairly conserved (aspartate from region II and lysine/argi- 
nine from loop region III) sometimes, but not always, form u 
salt bridge. The aspartate residue is also present in a large 
number of dehydrogenases., Another sequence, designated 
consensus region 1, was identified primarily among DNA 
methyltransferases. The residues in region I are involved in 
formation of the specific pl/aA loop, discussed above. As we 
have already concluded, even in methyltransferases in which 
the specific consensus region I sequence is not present, the 
length and conformation of the loop are conserved. It is 
most likely that the consensus region I sequence reflects a 
common origin of these enzymes rather than a necessary 
requirement for AdoMet binding. 

Comparison with CheR homologs from other organisms 

The nucleotide sequences of a number of methyltrans- 
ferase genes displaying sequence similarity with CheR have 
been determined from a diverse array of bacterial species. 
An alignment of the predicted amino acid sequences 
together with the secondary structure of S. typhimurium 
CheR is shown in Figure 6. Overall the sequences exhibit 
significant similarity, with identity to S. typhimurium CheR 
ranging from 87% for the closely related CheR methyltrans- 
ferase from Escherichia coii to 25% for a hypothetical protein 
from Campylobacter jejuni. Several features are apparent from 
the alignment. Sequence similarity among the large C-ter- 
minal domains is much stronger than in the small N-termi- 
nal domain. The region comprising the N-terminal domain 
differs in length in proteins from different organisms. In 
some proteins, the sequence extends beyond the C termi- 
nus of S. typhimurium CheR. As might be expected, there 
are variations in the lengths of regions corresponding to a 
few of the surface loops. Notably, there is weak similarity 
in the region of sequence corresponding to the antipar- 
allel p-sheet subdomain of £ typhimurium CheR (residues 
167-200). However, as was discussed previously in relation 
to AdoHcy binding, the residues that are involved in 
forming hydrogen bonds to AdoMet/AdoHcy are strictly 
conserved, and residues that form the hydrophobic pocket 
for the adenine moiety of the cofactor have conserved 
aliphatic sidechains. 

The biological and structural implications of the sequence 
comparison is that the overall fold, domain structure and 
mode of cofactor binding are conserved among these 
methyltransferases. With the exception of the proteins 
from Rhodohacter capsulatus, C. jejuni and Pseudomonas fluo- 
rescein^ the proteins presented in Figure 6 have been shown 
to be associated with chemotaxis or other motility systems, 
and are presumably involved in modification of receptors or 
receptor homologs. The substantial divergence in some 
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Figure 5 



Comparison of the cofactor-binding loops of 
the methyltransferases and alcohol 
dehydrogenase. A stereo view of a -carbon 
traces of the pl/aA cofactor-binding loops 
from six different methyltransferase crystal 
structures (CheR residues 1 19-134, yellow; 
catechol- Omethyltransf erase, peach; Hha\ 
DNA methyltransferase, dark blue; Hae\\\ DNA 
methyltransferase, magenta; Taq\ DNA 
methyltransferase, green; and RNA 
methyitransf erase VP39, gray), and alcohol 
dehydrogenase {light blue) are superimposed. 
The structure of the AdoHcy molecule from 
the complex with CheR is also shown as a 
ball-and-stick model, with the colors: yellow 
for carbon, blue for nitrogen, red for oxygen 
and green for sulphur. The figure was 
prepared using RIBBONS. 




regions of the methyltransferase sequences may allow for 
recognition of specific receptors. In different organisms, 
chemotaxis receptors exhibit significant variability thus the 
specific sequences of the methylation enzymes might allow 
them to adopt conformations necessary to recognize unique 
features of their substrates. The lower levels of sequence 
conservation in the N-terminal domain and in the antipar- 
allel P-sheet subdomain of the C-terminal domain are con- 
sistent with the hypothesis that these regions may be 
involved in interactions with the chemorcceptors. 

Interaction between the methyltransferase and its receptor 
substrate 

CheR catalyzes the carboxyl methylation of specific glu- 
tamyl sidechains within the cytoplasmic domains of 
the chemotaxis transmembrane receptors of enteric bacte- 
ria, such as £. coll and S. typhimtmum. The substrates of 
CheR are somewhat diverse — it recognizes four to five - 
different glutamyl residues within each of at least five - 
different receptors. Comparison of these sites has revealed 
a methylation consensus sequence, Glu-GIu-X-X-Ala- 
Ser/Thr, with methylation occurring at the second glutamyl 
residue [36-39]. Although there is a significant amount of 
sequence similarity, and presumably structural homology, 
at the different methylation sites [40,38,41], there must also 
be differences as reflected by the different rates of methy- 
lation observed at each site [21,22.42], 

In the methyltransferase-substrate (inhibitor) complexes, 
for which structures have been determined, the smaller 
domain of the methyltransferase is involved in binding the 
substrate and positioning it appropriately for presentation 
of the substrate methylation site to the methyltransferase 
active site located in the large ot/P domain. In the Hha\ and 
Hae\\\ methyltransferases. DNA binds in a cleft formed 



between the small and large domains [35,3], whereas the 
catechol O-methyltransferase binds the relatively small 
inhibitor 3,5-dinitrocatechol in a shallow groove of the 
large domain with hydrophobic interactions contributed by 
a residue from a loop of the small domain [5]. Despite the 
common participation in binding substrate, the folds of the 
small domains of these enzymes are quite distinct. By 
analogy with these methyltransferases, the N-tcrminal 
domain of CheR may participate in interactions with 
regions of the receptors that contain the methylation sites. 

The methyltransferase CheR binds tightly to the cyto- 
plasmic domains of the chemotaxis receptors [43,44]. The 
binding site has recently been localized to a five amino 
acid sequence, Asn-Trp-Glu-Thr-Phe, located at the 
extreme C termini of the £. colt and S. ryphhnurium aspar- 
tate receptors (Tar) and the £. colt serine receptor (Tsr). 
The intact receptor and a synthetic pentapeptide of the 
binding site motif exhibit similar binding affinities 
(K a ~4x lOSfvH) [23]. On the basis of this observation and 
the lack of conservation of this binding motif within all 
of the methylated receptors, a model for intermolecular 
receptor methylation has been proposed. In this model, 
CheR is tethered to the C-terminal tail of one receptor 
dimer, from where it methylates an adjacent, and perhaps 
heterologous, receptor dimer in the membrane. Although 
at this time we have no knowledge of the receptor peptide 
binding site in the methyltransferase. the antiparallel 
P sheet insertion in the large a/p domain provides an 
intriguing candidate. This subdomain, appears to be less 
ordered than the rest of the molecule, with average back- 
bone atoms B values of 28 A- (residues 170-195) as com- 
pared to 20 A- for the rest of the a/p domain. The 
P subdomain does not appear to be an integral part of the 
overall fold, and has only minimal interactions with the 
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Figure 6 
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Sequence alignment of CheR homologs from 
different organisms. The secondary structural 
elements of S. typhimurium CheR are shown 
as arrows (0 strands), bars (a helices) and 
lines (connecting loops) aligned above the 
amino acid sequences. Sites where identical 
amino acid residues occur in 75% or more of 
the proteins are boxed. Residues forming the 
linker are labeled with stars. Arginine residues 
from helix a2, which are postulated to interact 
with the chemotaxis receptors in E. cofi and 
S. typhimurium, are underlined. Vertical 
arrows indicate residues that form hydrogen 
bonds with AdoHcy. Accession numbers for 
the sequences are: Salmonella typhimurium 
<SP,P07801); Escherichia cofi (SP.P07364); 
Pseudomonas fluorescens (GB, L29642); 
Rhodobacter capsulatus (SP, Q02998); 
Bacillus subtifis (GB, M80245); Myxococcus 
xanthus (SP, P31 759); Rhizobium mefiloti 
(GB, U13166); Rhodobacter sphaeroides 
{PIR, S47261); Vibrio anguillarum 
(GB.U36378); Pseudomonas aeruginosa 
(GB.U1 1382); Campylobacter jejuni 
(SP, P45676); Rhodospiriflum centenum 
(GB, U6451 9); and Aerobacter aerogenes 
(SP,P21824). SP = SWISSPROT; GB = 
Gen Bank. The program PILEUP from the 
Genetics Computer Group Wisconsin 
package version 8 was used for the 
alignments [58]. 



large domain (see Fig. 2 and description of protein archi- 
tecture). Nor can the presence of this snbdomain be ratio- 
nalized in terms of catalysis at the active site, as it makes 
no contacts to the bound cofactor. 

We have recently obtained co-crystals of CheR in complex 
with the N-acerylated receptor pentapeptide (SD, unpub- 
lished results). Notably, crystals of the complex cannot be 
obtained under conditions used to grow crystals of CheR 
alone. Furthermore, crystals of the complex have a differ- 
ent morphology and cell constants than those of CheR, 



suggesting perhaps that the peptide influences either the 
conformation of CheR or specific lattice contacts between 
methyltransferase molecules. The solution of the struc- 
ture of the CheR-receptor-peptide complex should eluci- 
date the peptide-binding site on CheR and provide a 
foundation for further investigations of methyltransferasc- 
receptor interactions. 

Catalytic mechanism 

Biochemical analyses of the E. coli and S. typhimurium 
mcthylation systems have indicated that the rnethylacion 
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reaction catalyzed by CheR proceeds by a random 
mechanism [28j. The turnover number ot* lOmin -1 and the 
absence of a covalent enzyme-substrate intermediate 
support a reaction model involving direct transfer oi a 
methyl group from AdoMet to the glutamyl carboxyl of a 
specific sidechain of a chemotaxis receptor. It is also 
known that direct methyl transfer reactions occur by an 
S N 2 type of mechanism [45], thus the reaction rates are 
strongly dependent on characteristics of the nucleophilc, 
in this case, the receptor glutamate carboxylate oxygen. 
The active site for methyl transfer necessarily involves 
residues from both the region of the receptor that is 
methylated and CheR. On the basis of the knowledge of 
the reaction mechanism and insights provided by the 
structure of the CheR-AdoHcy complex, we propose the 
following: firstly, binding of the methylation region of the 
chemotaxis receptor occurs within the wide opening of 
CheR, formed by the central AdoMet-binding site, 
flanked on either side by the (3 subdomain and the N-ter- 
minal helical domain; secondly, formation of the Michaelis 
complex follows a conformational change in CheR and 
possibly in the substrate as well; and thirdly, reaction rates 
are dependent on the specific presentation of the receptor 
glutamate residues to the interaction surface of CheR, 
which probably involves the positively charged helix a2. 

In the structure of the CheR-AdoHcy complex, the cofac- 
tor is fully buried in the protein-binding cleft with no 
atoms accessible to solvent. The inclusion of a methyl 
group at the sulfur atom of homocysteine, which would 
correspond to a molecule of AdoMet, might require some 
movement of the surrounding sidechains, such as Tyr235 
or even Arg89. Binding of the receptor substrate could 
entail additional conformational changes, necessary to 
properly position and orientate the reactive groups. Such 
movement has previously been observed in the complex 
of DNA-Z/vfol DNA methyltransferase, in which binding 
of DNA to the enzyme was associated with a significant 
conformational change within the DNA-binding domain 
as well as distortion of the DNA helix, with the methylat- 
able base completely flipping out from the helix [35]. The 
intramolecular surface of interaction between N-terminal 
and C-terminal domains of CheR is relatively small (about 
400A 2 ) and is primarily built of several hydrogen bonds 
and a few hydrophobic interactions. These domains could 
potentially move in some type of the hinge motion, 
forming a complex with the receptor substrate or perhaps 
opening the active site to allow for binding of AdoMet. 

The observed differences in rates of carboxyl methylation 
reactions at different glutamate residues can be explained 
by two different effects, or perhaps by a combination of 
both. One possibility is that the reaction rates are deter- 
mined by the strengths of the nucleophile — carboxyl 
oxygen or by the mobility of a leaving methyl group, 
both of which can differ depending on other factors. This 



would suggest the involvement of an additional residue 
that would specifically stabilize a transition state. For 
example, an oxygen of a glutamate or tyrosine residue 
might interact with the positively charged sulfur atom, as is 
postulated in the reaction mechanism of glycine methyl- 
transferase, or some positively charged residue might inter- 
act with the glutamyl substrate (analogous to the role of 
Mg' + in catechol methyltransferase). The presence of a 
receptor methylation consensus sequence Clu-CIu*-X-X— 
Ala-Ser/Thr (X = any residue; Glu* = methylated residue) 
implies, however, that the active-site residues contributed 
by both CheR and the receptor are similar for all four 
methylatable glutamate residues. Hence, it is more likely 
that the reaction rates are determined by another mecha- 
nism, such as the availability, specific orientation and posi- 
tioning of the glutamate residues for methylation, rather 
than by a different composition of active-site residues. 
This interpretation is also supported by the mutagen- 
esis studies of Shapiro and Koshland, which have shown 
that the shorter carboxylate sidechain of aspartates cannot 
be methylated and that substitution of a methylatable 
glutamate with aspartate drastically decreases the rate of 
methylation on the glutamate residue N-terminal to the 
mutation [21]. 

Calculated electrostatic potentials (GRASP) [46] of the 
CheR surface revealed that the proposed surface area of 
interaction with the receptor is positively charged overall. 
This region includes an approximately 12 A wide posi- 
tively charged surface, leading to the AdoHcy-binding 
pocket (Fig. 7). Charges in this area are provided by helix 
a2; three arginine residues (53, 56 and 59) are lined up on 
one face of the helix and are oriented towards the pro- 
posed receptor interaction opening. Methylatable gluta- 
mate residues on the receptor are followed by two to three 
non-methylatable glutamates in the first a helix and pre- 
ceded by four to five non-methylatable glutamates in the 
second a helix of the predicted antiparallel coiled coil of 
the receptor cytoplasmic domain [41]. It is possible that 
these residues, all of them seven residues apart, form a 
negatively charged surface that is involved in specific 
interactions with the positively charged surface of CheR 
and that this complementary electrostatic interaction 
serves to position the methylatable glutamates within the 
active site. 

It should be noted that helix ot2 is significantly distorted, 
due to the hydrogen bonds formed by sidechain atoms 
of its residues and the mainchain atoms of the same or 
neighboring molecule (lattice contacts). This observation 
suggests a flexibility of this helix and its potential for 
adopting slightly different conformations that might be 
necessary for interaction with the receptor. 

According to the model of Wu et aL [23], the recep- 
tor substrate, via its C-tcrminal pentapeptide. will bind 
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Data collection 

The crystals belong to the monoclinic space group P2 1( with cell dimen- 
sions a=55.oA, b=48.oA, c=63.2A and (5=112.3°. There is one 
CheR molecule in the asymmetric unit and the solvent content is 40%. 

All data sets were collected at room temperature on an R-Axis II 
(Rigaku) image plate detection system mounted on a Rigaku rotating 
anode generator operated at 50 mA and 100kV, with double mirror 
focusing (Molecular Structure Corporation). Data were integrated using 
the program Denzo (48], All data were merged, scaled and truncated 
with the Rotavata/Agrovata and Truncate programs in the CCP4 suite 
[49]. Potential heavy-atom derivative data were scaled to the native 
data set using the program Scaleit in CCP4. 

MIR phasing, model building and refinement 
Difference Patterson maps, calculated with derivative and native pH 5.6 
data, were generated and examined using the programs Topdel, Fsfour 
and Mapview from the PHASES program suite [50). Coefficients for a 
cross-difference Fourier synthesis were calculated with the program 
Mrgdf (PHASES). Heavy- atom sites were identified by examination of 
difference Patterson maps, and subsequently validated by cross-differ- 
ence analysis. 

We were able to obtain only mercury and K 2 PtCI 4 derivatives, despite 
an extensive search through a variety of compounds. All of the mercury 
derivatives had two major and mainfy overlapping sites, which in the 
final model could be correlated with the positions of Cys31 and 
Cys229. Combination of all mercury derivatives and solvent flattening 
of the resulting phases yielded a poor electron- density map that, apart 
from detectable features for two a helices, was not interpretable. Plat- 
inum atoms from K 2 PtCJ 4 bound very strongly to CheR crystals with 
the main site at 0.0, 0.25, 0.45 (relative to the main Hge specific posi- 
tions of the Pt atoms, Pt-derived phases were pseudo-centrosymmet- 
ric and when used in the cross-difference Fourier method they gave 
an ambiguous solution in the Y direction for the mercury positions, 
even though the X and 2 coordinates of the mercury atoms were 
confirmed. An interpretable electron-density map was obtained with 
phases derived by combination of all of the derivatives followed by 
density modification procedures. 

The positions and occupancies of the heavy atoms for each of the 
derivatives were refined individually in MLPHARE (CCP4) [51] and 
then combined, giving a figure of merit of 0.71 . The handedness of the 
structure was determined using anomalous data for one of the mercury 
derivatives, MIR phases were calculated for data between 35.0 and 
3.0 A resolution. These phases were subjected to density modification 
using the program DM (CCP4) [52], which included solvent flattening, 
histogram matching and Sayer's equation options. Exclusion of any of 
the mercury derivatives resulted in much poorer quality of the phases. 

MIR/DM phases were used to calculate electron-density maps with 
data from 25-4 A and 25-3 A. A number of u helices and p strands 
were clearly identified. Both maps were used for the initial Ca-chain 
tracing, aided by the bones option within the graphics program Turbo- 
Frodo [53]. Turbo was used for all of the map interpretation, model 
building and subsequent map fittings. There was several regions of 
ambiguity within the maps. Great improvement in the quality of the 
electron-density map was achieved by using the iterative skeletoniza- 
tion process implemented in DM. At this time, the phases were 
extended to 2.7 A. The resulting electron-density map enabled us to 
trace 243 out of 287 residues. Residues 1-11 and 285-287 were 
missing, and the region between residues 1 66 and 200 fell within dis- 
ordered electron density that was difficult to interpret. The partial model 
was taken through several rounds of positional and simulated-annealing 
refinement in X-PLOR [54], followed by refitting of the electron-density 
maps. Model rebuilding was done by examining SIGMAA weighted 
maps, and 3F C -2F C and original MIR/DM electron-density maps. In the 
first few cycles, refinement was carried out with data from 8-3.0 A, 
after which the resolution was extended to 2.4 A. Calculated phases 



from the improved model revealed better features in a 3F Q -2F C elec- 
tron-density map, such that it was possible to trace residues 167-199 
and confirm sidechain assignments. The new model, which included 
residues J 1-284 as well as AdoHcy, was refined in X-PLOR starting 
with 2.4 A and then including data to 2.0 A, resulting in a working factor 
of 0.277 and a Free R factor of 0.367, without any water molecules 
and B factors included in refinement. This model was then refined 
against the native data set at pH 7.0, which has slightly better statistics 
and higher completeness compared to the pH5.6 data set. Water mol- 
ecules were added using unrestrained and restrained refinement of 
ARP [55]. Water molecules were carefully examined and another round 
of X-PLOR refinement was carried out yielding the final model. 

Accession numbers 

Atomic coordinates have been deposited in the Brookhaven Protein 
Data Bank, with the code 1 af7. 
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ABSTRACT 



We have determined the structure of PviA\ methyl- 
transferase (M.PviM) complexed with S-adenosyl- 
L-methionine (AdoMet) by multiwavelength anomalous 
diffraction, using a crystal of the selenomethionine- 
substituted protein. M.PvuU catalyzes transfer of the 
methyl group from AdoMet to the exocyclic amino (N4) 
nitrogen of the central cytosine in its recognition 
sequence 5'-CAGCTG-3'. The protein is dominated by 
an open oc/p-sheet structure with a prominent V-shaped 
cleft: AdoMet and catalytic amino acids are located at 
the bottom of this cleft. The size and the basic nature 
of the cleft are consistent with duplex DNA binding. 
The target (methylatable) cytosine, if flipped out of the 
double helical DNA as seen for DNA methyltransfer- 
ases that generate 5-methylcytosine, would fit into the 
concave active site next to the AdoMet. This M.Pvull 
a/p-sheet structure is very similar to those of M./toal (a 
cytosine C5 methy transferase) and Ift.Tacfl (an adenine 
N6 methyltransferase), consistent with a model 
predicting that DNA methyltransferases share a com- 
mon structural fold while having the major functional 
regions permuted into three distinct linear orders. The 
main feature of the common fold is a seven-stranded 
p-sheet (6J 7| 5| 41 1] 2J 31) formed by five parallel 
P-strands and an antiparallel p-hairpin. The p-sheet is 
flanked by six parallel a-helices, three on each side. 
The AdoMet binding site is located at the C-terminal 
ends of strands pi and p2 and the active site is at the 
C-terminal ends of strands p4 and p5 and the N-terminal 
end of strand P7. The AdoMet-protein interact ion g ^ 
almost identical among M.PwjH. tA.Hha\ and M.Taql, as 
well as in an RNA methyltransferase and at least one 
small molecule methyltransferase. The structural simi- 
larity among the active sites of M.PvulI, M.Taql and 
M.HhaH reveals that catalytic amino acids essential for 
cytosine N4 and adenine N6 methylation coincide 



spatially with those for cytosine C5 methylation, 
suggesting a mechanism for amino methylation. 

INTRODUCTION 

DNA methyltransferases (Mtases) transfer a methyl group from 
S-adenosyl-L-methionine (AdoMet) to a given position of a 
particular DNA base within a specific DNA sequence. The 
resulting methylation can protect the DNA from a cognate restriction 
endonuclease or can have epigenetic effects on gene expression. 
The DNA Mtases belong to two families: one methylates C5, a 
ring carbon of cytosine, yielding 5-methylcytosine (5mC), while the 
second family methylates the exocyclic amino group (NH 2 ) of 
cytosine or adenine yielding 7V4-methy!cytosine (N4mC) or 
yV6-methyIadenine (N6mA) respectively. Two of the 5mC Mtases 
have been structurally characterized as covalent reaction intermedi- 
ate complexes with their DNA substrates (1,2); one of these, 
MHhal has been characterized in complexes with structural 
analogs of DNA in three different methylation states, unmethylated, 
hemimethylated and fully methylated (3,4). 

The primary sequences of the 5mC Mtases share a set of 
conserved motifs (I-X) in a constant linear order (5-9). The 
majority of these motifs are responsible for three basic functions 
of the 5mC Mtases: AdoMet binding, sequence-specific DNA 
binding and catalysis of methyl transfer. In contrast, the 
amino-Mtases (which generate N6mA or N4mC) belong to three 
groups characterized by distinct linear orders for the conserved 
motifs ( 1 0). The three groups are named a (including Mtases such 
as Dam), p (including Mtases such as M./VuII) and y (including 
Mtases such as M.Taql). To date only one DNA amino-Mtase has 
been structurally characterized, the group yN6mA Mtase M.Taql 
(H). 

While the M.Taql structure has been determined only in the 
absence of DNA, it is sufficient to allow general structural 
comparison with the 5mC Mtases. Both M.Hhal and M.Taql are 
bilobal structures: one lobe contains a catalytic domain with both 
the active site for methyl transfer and the AdoMet binding site and 
the other lobe contains a target (DNA) recognition domain 
(TRD). The catalytic domains of the two proteins exhibit very 



*To whom correspondence should be addressed at present address: Department of Biochemistry, Emory University School of Medicine 1510 Clifton Road 
Atlanta, GA 30322, USA. Tel: +1 404 727 8491; Fax: +1 404 727 3746; Email: xcheng@emory.edu 



The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors. 



similar three-dimensional folding (12). This folding pattern is 
also present in MJJaeUl, another 5mC Mtase, in catechol 
0-Mtase, a single domain small molecule AdoMet-dependent 
Mtase, in VP39, an mRNA cap-specific RNA 2'-0-Mtase and in 
glycine N-Mtase (2,13-15). The folding similarity includes the 
positions of conserved amino acid side chains involved in either 
AdoMet binding or catalysis; only the binding of AdoMet 
reported for glycine AT-Mtase differs from the consensus pattern 
(15). Guided by this common catalytic domain structure, 
sequence alignment of amino-Mtases suggests that for all 
amino-Mtases to fit the consensus M.HhaVM.Taql catalytic 
domain structure, despite having different motif orders, different 
sets of topological connections would be required for the three 
DNA amino-Mtase groups (10). 

Determining the structure of PvuU methyltransferase (M-Pvufl), 
a group p N4mC Mtase, would thus address two important 
questions about DNA Mtases. First, do the N4mC Mtases in fact 
match the consensus catalytic domain structure seen between 
M.Taql and M.Hhal (12)? Second, are the major structural 
elements of amino-Mtases connected in three different orders, as 
suggested by their primary sequences (10)? M.Taql itself did not 
provide a strong test for this model because the group y and 5mC 
Mtases have essentially the same motif order; they differ only in 
the position of motif X ( 1 0). No Mtase from group a or p has been 
structurally characterized before this report. 

M.PvwII, part of the restriction-modification system from the 
Gram-negative bacterium Proteus vulgaris (16), modifies the 
internal cytosine of the recognition sequence 5'-CAGCTG-3 / (17) 
to generate N4mC (18). PvuU endonuclease, which cleaves 
duplex DNA at the center of the same recognition sequence to 
generate blunt-ended products, was structurally characterized 
earlier (19,20). With this report, the PvuU restriction-modification 
system becomes the first system for which the structures of the 
cognate endonuclease and methyltransferase have both been 
determined. 



MATERIALS AND METHODS 

Overexpression and crystallization 

Overexpression and purification of and selenomethionine 
(SeMet) incorporation into M^VmH have been described previously 
(21). To crystallize the M./VwH-AdoMet binary complex, 0.2 mM 
AdoMet was added to the pre-purified protein (-5 fjM) and the 
mixture was further purified by cation exchange chromatography 
(21). M.Pvull and selenomethionyl M.Pvull, complexed with 
AdoMet, both crystallized in the monoclinic space group P2j 
with unit cell dimensions of a = 48.8 A, b = 1 1 2.4 A, c = 59.3 A 
and p = 109.2° (21). There are two molecules per crystallo- 
graphic asymmetric unit cell, termed molecules A and B. X-Ray 
diffraction data were collected using a MarResearch imaging 
plate detector on beamline X12-C at the National Synchrotron 
Light Source, Brookhaven National Laboratory, and processed 
using the HKL software package (22). Multi wavelength anomalous 
diffraction (MAD; 23) data to 3.3 A resolution (Table 1) were 
collected on a single frozen SeMet crystal at three different 
wavelengths, corresponding to the inflection point XI (minimum 
A/) and the peak X2 (maximum A/') of the Se-containing crystal 
absorption spectrum and a third wavelength (A3) remote from the 
peak position (2 1). A higher resolution data set (up to 2.8 A) used 
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for final model refinement was collected from a native crystal (A4 
= 1.072 A, 180° rotation, 1.5° increment, 90 s exposure). 

SeMet MAD phasing 

There are a total of 18 possible Se sites per asymmetric unit (nine 
per molecule). To locate the Se positions, we calculated the 
anomalous and isomorphous difference Patterson maps at the 
Harker section (v = 1/2) among data sets collected at wavelengths 
XI, X2 and A3. A number of peaks were observed, which 
corresponded to possible Se sites and the cross vectors between 
them (21). Five Se sites were first manually determined from the 
Patterson maps. These five sites were used to calculate initial 
estimates of phases, to compute the difference and Bijvoet 
difference Fourier synthesis and to search for additional Se sites. 
Finally, a total of 12 Se sites were determined and confirmed by 
the two- fold non-crystal lographic symmetry (NCS) operator, 
revealed by a self-rotation function (21). 



Table 1. Statistics of experimental SeMet MAD data with rejection criteria 
7/o(/) > 2 





XI 


U 


X3 


Wavelength (A) 


0.98233 


0.98211 


0.92 


Energy (eV) 


12 621 


12 624 


13 476 


Resolution range (A) 


-^3.30 






Completeness (%) 


94.5 


92.2 


94.5 


linear = Zl /-</>!./£/ 


0.048 


0.051 


0.044 


<//0> 


17.0 


16.1 


18.6 


Observed reflections 


29 236 


27 108 


27 410 


Unique reflections 


8904 


8 790 


8 913 


Anomalous pairs 


7 576 


8212 


7 475 


Highest resolution shell (A) 


3.36-3.30 






Completeness (%) 


90.0 


87.5 


90.5 


linear = ^1 l-<l>\f£J 


0.097 


0.101 


0.087 


<//o> 


9.6 


8.9 


11. 1 


Unique reflections 


425 


391 


431 



These 1 2 Se positions were used for MAD phasing by treating 
the data from each wavelength as a multiple isomorphous 
replacement experiment with the inclusion of anomalous scattering 
(MIRAS): native with native anomalous scattering (X3), deriva- 
tive isomorphous (k\) and derivative isomorphous with anomal- 
ous scattering (X2; Table 2). The MAD-MIRAS phases were 
improved using 40% solvent content by four, four and eight cycles 
of solvent leveling (24) following each of three envelope determina- 
tions (25). The solvent-leveled map was used to refine the NCS 
operator and to construct the averaging mask. The phases were 
further improved using 1 6 rounds of Furey's averaging protocols 
(25). The electron density was averaged within the mask, the 
density for each molecule was replaced with the average and the 
'averaged' density map was inverted to obtain new phases. The 
resulting phases were combined with the original solvent-leveled 
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MAD-MIRAS phases. The process was cycled until convergence 
was obtained (16 cycles). 

Refinement 

The starting Cot backbone for molecule A was traced using the 
skeleton in program O (26) with reference to three maps at 3.3 A 
resolution: MAD-MIRAS, solvent flattened and density averaged. 
The Se positions also provide markers for selenomethionine in 
the polypeptide chain tracing. The atomic coordinates for 
molecule B were generated by the two-fold NCS operator. After 
the initial model building, the atomic model was subjected to 
refinement against 2.8 A resolution data from a native crystal 
(Table 3). Initially, a strict NCS was invoked, assuming that two 
NCS-related molecules are strictly identical. The two models 
were refined by simulated annealing and least squares minimizations 
using the X-PLOR program suite (27). Seven rounds of 
refinement and model rebuilding brought the crystallographic R 
factor to 0.22. The model was further refined by a restraint NCS, 
with two NCS-related atoms restrained in their average positions. 
An additional five rounds of refinement, refitting and placing 
ordered water molecules brought the R factor to 0. 19 and 7?f ree to 
0.28 (Table 3). 



Table 2. Treatment of SeMet MAD data as MIRAS at 3.3 A resolution 



Native wavelength 




X3 




Derivative wavelength 


\\ 


X2 


313 


isomorphous/anomalous 


iso 


iso/ano 


ano 


Phasing power 3 


2.350 


1.830/1.410 


1.080 


/?-Kraut b 


0.028 


0.028/0.037 


0.029 


/?-CuIIis c 


0.516 


0.601/- 




Figure of merit 


0.389 


0.365/0.308 


0.265 


Overall figure of merit 


0.619 







a Phasing power = r.m.s.(<F H >/£), where F H is the calculated 'heavy atom' 

structure factor amplitude and £ is the residual lack of closure. 

b tf- Kraut = £ | F m - F m cal I IZ I F PU I for centric data and K I F m + - Fy^^ I + 

I /*ph_ - Fph- 001 I )/£ I Fpn+ + Fph_ I for acentric data. 

c tf-CuIIis = X 1 1 Fpa ± F P \ - F H I /I I F PH ± F P I for centric data, where F PH and 

Fp are the observed structure factor amplitudes for the 'derivative* and 'native* 

data sets and F}\ is the calculated 'heavy atom* structure factor amplitude. 

RESULTS 

Structure determination 

M/VwII is produced in two forms, resulting from translation 
initiators 13 codons apart (17). The shorter form of M.PvwII, 
starting from the internal translation initiator at Met 14, was 
overexpressed in Escherichia coli and purified both in native and 
selenomethionine-substituted forms (21). The MJ*vuU polypeptide 
chain is 323 amino acids long (numbered 14-336). Diffraction 
data (Table 1) were collected at three X-ray wavelengths from a 
crystal of the selenomethionyl M./^uH-AdoMet complex, so 
that MAD could be used to extract the phases (23). Following the 
suggestion of Ramakrishnan (28,29), multiwavelength data were 
treated as if they were from a conventional MIRAS experiment. 



A total of 12 (out of 1 8 possible) Se sites per asymmetric unit were 
determined from Patterson maps and were used for MAD phasing 
with a figure of merit of 0.62 at 3.3 A resolution. The 
MAD-MIRAS map, coupled with two-fold non-crystallographic 
symmetry averaging, was accurate enough to permit an initial 
interpretation (21) and the model was finally refined to 2.8 A 
resolution with a crystallographic R factor of 0.19 and an #f ree 
value of 0.28. 



Table 3. Structural refinement of MPvwlI at 2.8 A resolution 



Wavelength (A) 


1.072 




Resolution range (A) 


—2.8 


2.85-2.8 (highest resolution shell) 


Completeness (%) 


99.7 


97.5 


R\\ n w = T\l-<I>\fU 


0.052 


0.226 


<I/0> 


11.5 


5.5 


Observed reflections 


54 787 




Unique reflections 


14 886 


730 


^foctor^XlFo-Fcl/ZlFot 


0.193 






0.283 




Non-hydrogen protein atoms 


4455 




r.m.s. deviation from ideality 






Bond lengths (A) 


0.02 




Bond Angles (°) 


2.9 




Dihedrals (°) 


24.9 




Improper (°) 


2.2 





l * factor and #frce are calculated for -92 and 8% of the data respectively. 



Overview of the MJVall structure 

The polypeptide chain folds into a structure with a V-shaped cleft, 
big enough to accommodate duplex DNA (Fig. 1). The V-shaped 
cleft is formed by three loops on one side and a three-helix bundle 
on the other side. The methyl donor AdoMet binds at the bottom 
of the cleft, which consists of a twisted 10 stranded 0-sheet around 
which six ct-helices are arranged on both sides. 

Figure 2 shows the topology diagrams of M./VwII, M.Hha\ and 
M.Taql. For clarity and convenience, we retain the nomenclature 
of Schluckebier et al. ( 1 2) for the secondary structure assignment 
and of Posfai et al. (5) for the conserved motifs. Loops or turns 
are designated by their flanking secondary structures; two of them 
are termed the glycine-containing G loop (loop 1-A) and the 
proline-containing P loop (loop 4-D) ( 1 0). The catalytic domains 
of the three structures are all of the ot/0 type with a central 0-sheet 
sandwiched between two layers of ot-helices: helices ctC, OtD and 
ocE located on one side and helices ctZ, otA and aB on the opposite 
side of the sheet (Fig. 2a). The 0-sheets in the three structures all 
contain five central adjacent parallel 0-strands with strand order 
5, 4, 1, 2, 3 and one antiparallel hairpin (06 and 07) next to strand 
05. The order of parallel strands is reversed once between 04 and 
01 . The majority of the active amino acids from conserved motifs 
(circled in Fig. 2) are located at the carboxyl ends or in loop 
regions outside the carboxyl ends of these parallel 0-strands. In 
all three structures the AdoMet binding site is located at the 
carboxyl ends of strands 01 and 02 and the amino end of helix aC; 
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Figure 1. Ribbon (70) diagram of M.fV«II. (a) The protein folds into a structure with a V-shaped deft. AdoMet (in ball-and-stick representation) is bound at the bottom 
of the cleft. The regions of M JVwII that are structurally most similar to M.Hhal and M.Taql are shown in brown and the less similar regions are shown in white and 
green. The green region is part of the putative DNA target recognition domain (TRD). The catalytic P loop is in pale blue and red. The pale blue part contains conserved 
amino acids Ser53-Pro-Pro-Phe56 and the red part is flexible (high thermal factors), consistent with potential conformational change upon DNA binding, (b) M/VuII 
docked to cognate DNA, taken from the R./VhII-DNA structure (19). The DNA phosphate backbone and sugar rings are in purple, the DNA bases are in green and 
AdoMet is in yellow, (c) M.fVwII docked to DNA with a flipped cytosine (see Fig. 5). • 



and the active site at the carboxyl ends of strands p4 and p5 and 
the amino end of the strand p7 (see below). The N- and C-termini 
of the folded polypeptide are within the AdoMet binding region 



in all three structures: located in the region between helix aZ and 
strand pi (MJihal in Fig. 2b), prior to helix otZ (M.Taql in Fig. 2c) 
and between helix ctB and strand p3 (M.PvwII in Fig. 2d). 
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Figure 2. Topology diagrams, (a) The consensus methylase fold, derived from DNA Mtases (M.HhaJ t MJiaelU, MTaql and M/VwII) and one small molecule 
AdoMet-dependent Mtase (catechol 0-Mtase). The main feature of the fold is a region of five parallel (5-strands (5, 4, 1, 2, 3) followed by an antiparallel p-hairpin 
(strands 6 and 7), surrounded by six helices, three (ctC, otD, otE and ccZ, aA, aB) on each side of the p-sheet. The dashed loops can be broken to become the N- and 
C- termini. Opposite page: (b) M.Hhal, (c) M.Taql and (d) M./V«II diagrams indicate their similarity in the catalytic domains. ct-Helices are shown as rectangles 
(lettered) and p-strands as broad arrows (numbered). Common elements of secondary structure among the three enzymes are shown in similar positions. Conserved 
or functionally important amino acids from motifs 1-X are circled. The p-strands (ten in M JVuII, seven in M.Hhal and nine in MTag\) form a p-sheet. In M.PvuU, 
a region (dashed line) between strands p7 and (J8 is not modeled in the current structure, (e) Predicted topological folding for group p amino-Mtases from an earlier 
study (10). 



MJVmII fits the consensus fold for AdoMet-dependent Mtases 

The TRD, which is associated with sequence-specific DNA 
recognition, lies in the smaller domain of the three bilobal Mtases 
discussed above. In the current structure of M.PvmII, this domain 
comprises only one helix (otF) and its associated loops (Fig. 2d). 
It is interesting how the TRD is connected to the catalytic domain 
in the three Mtases. In 5mC Mtases such as M.Hhal, helix ocZ 
(motif X, part of the catalytic domain) is folded from the C-terminus, 
following the TRD. Thus, there are two connections between the 
catalytic domain and TRD (Fig. 2b). In group y N6mA Mtase 
M.Taql, helix ccZ originates from the N-terminus and the TRD is 
linked to the catalytic domain through 09 only (Fig. 2c). Thus, in 
both M.Taql and M.Hhal the functional regions are in the order 
(amino— »carboxyl) AdoMet binding region, active site region and 
TRD, the major difference between them being that helix otZ is 
moved from the N-terminus in MJaql to the C-terminus in MJihal. 

As predicted (10), the most pronounced difference in topology 
between M.fVwII and both M.Hhal and M.Taql is the connection 
between the AdoMet binding and active site regions: the two 
regions are connected via the putative TRD (helix ocF) in the order 
(amino-»carboxyl) active site region, TRD and AdoMet binding 
region (Fig. 2d). The active site and AdoMet binding regions of 
M.PvmII fit the consensus structure ofM.Hhal/M.Taql/M.HaeUU 
catechol O-Mtase, regardless of the motif order in the primary 
sequence. We call this common catalytic domain structure the 
AdoMet-dependent methylase fold (Fig. 2a). This fold has also 
been observed in the RNA Mtase VP39, though helix otE is 
replaced by a p- strand (14). 



We had predicted the folding of group p amino-Mtases, 
including M.Pvt/II (Fig. 2e), based on structure-guided sequence 
analysis (10). Overall, the prediction is quite accurate, though 
there are some significant differences between the prediction and 
the current model. Unexpectedly, part of the AdoMet binding 
region (p3-ctC or motif III) is located upstream of the active site 
region, near the N-terminus of the polypeptide. This arrangement 
preserves the crossover between strands p3 and p4, but splits the 
coding for the AdoMet binding region into two distant parts of the 
gene. The P3-aC secondary structure (motif III) was predicted to 
originate from the C-terminus as a contiguous part of the AdoMet 
binding region. This prediction, which would result in no 
crossover connection between strands p3 and p4, was made in 
part because of the very short distance between the N-terminus of 
another group P Mtase (MJBamUlT) and its strand P4 (Fig. 3). 
However, this crossover has been observed in all currently 
available DNA Mtase structures (5mC Mtases MJihal and 
M.HaelU, group P Mtase M .PvhII and group yMtase M.Taql), as 
well as in catechol O-Mtase, glycine iV-Mtase and the RNA Mtase 
VP39, and is predicted to occur in group a amino-Mtase structures, 
with the crossover connection in a separate domain comprising 
the TRD (10). 

Such a crossover is necessary to generate a so-called topological 
switch point (30), at which the strand order is reversed and loops 
connected to the carboxy I ends of the two adjacent strands (P 1 and 
p4 in Fig. 2a) go in opposite directions. The positions of concave 
active sites can be predicted from such switch points in different 
types of a/p twisted open sheet structures, including arabinose 
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Figure 3. Sequence alignment of group p amino-Mlases including eight N4mC and nine N6mA Mtases (there is some uncertainty on assignments, particularly for 
M.tfmfl). Conserved amino acids are grouped as (E» D, Q, N), (V, L, I M), (F, Y, W), (G, P, A), (K, R) and (S, T), using standard one letter abbreviations. Invariant 
amino acids are shown as white letters against a black background; conserved positions are indicated by bold letters within a box. Lesser degrees of conservation are 
shown, in decreasing order, by bold and upper case letters, while non-conserved positions are shown as lower case letters. A dash (-) indicates a deletion relative to 
other sequences and a slash (/) followed by a number indicates an insertion and its size. Motifs 1-X are labeled using the nomenclature of Posfai et al (5). The secondary 
structures of MJVuII are indicated by cylinders {ot-helices) and arrows (^-strands) drawn directly above the amino adds forming them. The dashed line indicates a 
flexible region (amino acids 179-216) which is not modeled in the current structure. This region includes four out of five preferred trypsin cleavage sites indicated 
by arrows (34). 



binding protein (3 1), carboxypeptidase (32) and tyrosyl-tRNA 
synthetase (33). 

Disordered regions 

As noted above, there are two molecules per crystal lographic 
asymmetric unit cell, termed molecules A and B. The current 
model of molecule A contains residues al 6-a 1 78, a2 1 7-a335 and 
one AdoMet, while molecule B contains residues b!6-b56, 
b69-bl78, b215-b335 and one AdoMet. The r.m.s. deviation 
between 269 common Ccc atoms of the final refined two 
molecules is 0.6 A. In both molecules -40 amino acids 
(Prol79-Gly216), located immediately after strand p7 and before 



strand (*8, were not modeled in the current structure because of 
poor electron density. This poor density suggests that these amino 
acids are very flexible. Consistent with this flexibility, four out of 
five preferred trypsin cleavage sites are within this 40 amino acid 
region: the primary cleavages occur on the carboxyl sides of 
Arg 1 83 and Lys 1 86 and are followed by slower cleavages carboxyl 
of Lysl98, Lys208 and Arg323 (34). In fact, SDS-PAGE analysis 
of dissolved crystals indicates that some M./Vtdl crystals 
contained limited amounts of protein that had been cleaved in this 
region. It is noteworthy in this regard that some 5mC Mtases are 
naturally made as two separate polypeptides that associate in the 
cell to form active enzyme (35,36). 
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In molecule B part of the catalytic P loop (amino acids 57-68) 
was also not modeled due to poor electron density. However the 
corresponding P loop in molecule A was modeled, though 
Leu58-Asn66 (red in Fig. 1) possessed the highest crystallographic 
thermal factors in the current refined structure. This flexibility 
may be due to the absence of the DNA in the crystal and suggests 
a potential conformational change upon DNA binding. Similarly, 
the catalytic P loop in M.Hhal y which contains the key catalytic 
amino acids Pro80-Cys81, undergoes a massive conformational 
change upon binding DNA, moving -25 A toward the correspon- 
ding DNA binding cleft of the protein (1). 

AdoMet binding 

The binding site for AdoMet is adjacent to the carboxyl ends of 
strands pi, p2, the amino end of helix aC and the loop prior to 
helix aZ, regions that contain conserved motifs I, II, III and X 
respectively (Fig. 4). The interactions between AdoMet and 
M.PvuU are almost identical to those between AdoMet and 
M.Hhal (1), M.Taql (11), catechol 0-Mtase (13) and VP39 (14). 
Amino acid side chains interacting with AdoMet are found in 
spatially equivalent positions, except that Phe273 of M JVwII and 
Phel8 of M.Hhal are in the G loop, while the corresponding 
Phel46 of M.Taql is in helix ocD (Fig. 2). 

In motif I of group p Mtases (Asp-X-Phe-X-GIy), the amino 
acids Asp and Gly are invariant. In M.PvuU these correspond to 
Asp271 and Gly275 (Fig. 3). The side chain carboxylate of 
Asp27 1 (pi ) makes two hydrogen bonds to the main chain amide 
group of Phe273 (G loop) and the side chain hydroxyl of Thr279 
(otA) and these bonds stereochemical ly constrain the pl-loop-ccA 
structure. A negatively charged amino acid corresponding to 
Asp27 1 has been found in the same position of motif I in all DNA 
Mtases sequenced so far, including Asp 16 of MJlhal and Glu45 
of M.7Zz^I (9,1 0). The main chain amide group of Gly275 (G loop) 
hydrogen bonds to the side chain carboxylate of GIu294 (p2), 
which is another conserved negatively charged amino acid (motif II) 
that interacts with the ribose hydroxyls of AdoMet Comparable 
backbone-side chain interactions occur in M.Hhal (Gly20-Glu40) 
and M.r^I (Ala49-Glu71). 

In M.PvuH the AdoMet binding 0(/p cluster (ctZ— »pl-KxA-» 
p2— >otB) is further stabilized by the interactions of Arg288 (an 
invariant arginine among group p Mtases located prior to strand 
P2; Fig. 3) with the side chain of Thr263 (loop Z- 1 ) and backbone 
carboxylsofbothThr263 (loop Z-l)andGlu286 (loop A-2). Only 
three structurally characterized AdoMet binding proteins interact 
with AdoMet in substantially different ways from the nucleic acid 
Mtases and catechol 0-Mtase. One of these proteins is the Exoli 
MeU repressor (37), for which AdoMet is a co-repressor and not 
a substrate; another is the reactivation domain of Exoli methio- 
nine synthase (38), which uses AdoMet in a flavodox in-coupled 
reductive methylation of cobalamin. The third is glycine /V-Mtase 
which does have a region structurally very similar to the 
consensus AdoMet-binding regions, though that is not where 
AdoMet was bound in the reported structure (15). 

AdoMet binding and target base binding sites are 
structurally similar to one another 

TheM./ > vHH protein has approximate two-fold pseudo symmetry 
around the center of the cleft, due in part to the structural 
similarity of the AdoMet binding site to the active site. These sites 
are each dominated by comparable ot/p clusters, ocZ— >pl -»aA— » 



P2-»aB and otC— »p4-^aD— »P5— kxE; the former includes 
motifs I, II and X and forms the bulk of the AdoMet binding 
region and the latter includes motifs IV- VI and forms the bulk of 
the active site region. The two ot/p clusters can be superimposed 
by rotating strands pi and P2 onto strands p4 and p5 (Fig. 4b). 
This yields an r.m.s. deviation of 0.7 A for the Cot atoms of these 
p- strands. Similar superimposability has also been observed for 
the ot/p clusters of the 5mC Mtases MUhal and M.HaelU and the 
N6mA Mtase M.Taql (10). This observation has led to the 
suggestion that the original Mtases arose after gene duplication 
converted an AdoMet binding protein into a protein that bound 
two molecules of AdoMet (see also 39-42) and that the two 
halves then diverged (10). Regardless of the evolutionary model, 
the M.PvuU structure suggests that this internal structural repeat 
is a feature common to most AdoMet-dependent Mtases. Only the 
reactivation domain of Exoli methionine synthase does not fit 
this pattern (38). 

DISCUSSION 

Predicted DNA binding and base flipping 

It is very likely that the V-shaped cleft of the protein is where 
DNA binds. In the absence of large scale protein conformational 
changes, the cleft is large enough to accommodate double-stranded 
DNA without steric hindrance (Fig. lb). Positively charged 
groups, capable of interacting with the DNA phosphate backbone, 
are prominent on the surface of the cleft from the P loop 
(Arg60-Lys-Lys62), loop 5-E (Lysl03 and Argl08) and loops 
6-7 (Lysl38, Lysl48-Arg-Lysl50, Argl52 and Lysl54). We 
have docked a 13mer B-DNA duplex containing the Pvull 
recognition sequence, taken from the R./VmII-DNA structure 
(19), against the basic face of the cleft (Fig. lb). The fit of the 
DNA in the cleft is extremely convincing, with the protein 
occupying a distance of -37 A along the axis of the double helix, 
which suggests that M.Pvull intimately contacts a 10 nt stretch 
including the 6 nt recognition sequence. 

The MJ//wI-DNA structure provided the first example of base 
flipping (1). Several other types of enzymes are now also known 
or believed to use this approach (43,44), including the DNA 
repair enzymes T4 endonuclease V and human uracil-DNA 
glycosylase (45,46). The M.PvmII structure is consistent with a 
base flipping mechanism. Base flipping is a process by which an 
enzyme can rotate a DNA nucleotide out of the double helix, 
breaking only the base pairing hydrogen bonds and trapping it in 
a protein binding pocket. In our docking model the DNA is 
positioned such that the target cytosine is in the helix and the NH2 
group to be methylated is far from the active CH3 group of 
AdoMet (Fig. lb). Thus it is likely that M.PvuU (an amino-Mtase) 
flips the cytosine out of the DNA helix to access the target amino 
group (Fig. lc), in a manner similar to that employed by 5mC 
Mtases, M.Hhal and M.Haelll (1-4). The structure of M.Taql 
and spectroscopic data for M.EcoRl suggest that these two 
amino-Mtases flip the target adenine out of DNA (47,48). 

Although it is possible to predict where DNA binds, we cannot 
identify any known DNA binding motifs in the current structure 
that might be responsible for DNA sequence specificity. Further- 
more, there is no obvious similarity between M.PvuU and the 
structures of R.Pvull or MyoD in complex with DNA (19,49); 
both are homodimeric proteins recognizing the same DNA 
sequence, CAGCTG. R.PvmII uses a p-ribbon motif to interact 
with nucleotides in the DNA major groove, while the myogenic 
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Figure 4. AdoMet binding site, (a) AdoMet is involved in contacts with four regions, formed by four motifs, the G loop (motif 1 in brown), strand p (motif II in pale 
blue), helix ctC (motif III in purple) and loop F-Z (motif X in green), (b) Superimposiiion of the two a /p clusters. The first cluster, aZ-^pl -»aA-»p2-*aB, in white, 
is rotated (now in green) with respect to the second cluster, aC-*p4-»aD->p5-*ctE, in brown, to achieve the most overlap possible. Also shown are the positions, 
relative to the respective a/p clusters, of the AdoMet adenosyl moiety (white and green) and the target cytosine ring, in brown (inferred from M Jttal-DNA structure, 
see Fig. 5). 



transcription factor MyoD is a basic helix-loop-helix protein. 
The lack of obvious similarity may reflect the disparate roles of 
these three CAGCTG- recognizing proteins. DNA Mtases carry 
out base flipping (within specific nucleotide sequences) so they 
can access the atom to be methylated on the target nucleotide. 
Such a mechanism is not required for other sequence-specific 
proteins, such as transcription factors (for which specific binding 



is the main role) and restriction endonucleases (which only act on 
the readily accessible DNA phosphate backbone). 

As mentioned before, only two 5mC Mtases, WUihal and 
M.Hae\\\, have been structurally characterized in complex with 
their DNA substrates. The protein-base contacts in the recognized 
sequence are expected to differ between MMhal and M.HaelW 
due to their different specificity and, indeed, the folding of the 
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corresponding TRDs is different (2). However, both TRDs 
contain a shared feature: two recognition loops (1,2,44). In the 
M.PvmII structure, two loops (prior to and after helix aF) on the 
other side of the V-shaped cleft could easily fit into the concave 
face of the major or minor groove of B-form DNA. These two 
loops, which may correspond to the two 5mC recognition loops, 
are held in place through scaffolding made up of three helices, aF, 
otB and otBj. A similar pair of recognition loops has also been 
proposed for M.Taql (47). The reason for such conservation may 
be that sequence recognition is a part of the base flipping 
mechanism and loops, instead of the more rigid structures of 
oc-helix or ^-strand, are used for discriminating DNA sequences 
flexibly and effectively. 

Predicted catalytic mechanism for DNA amino methylation 

What we call the catalytic P loop of the amino-Mtases was found 
in early sequence comparisons and called an 'Asp-Pro-Pro-Tyr 
motif based on its sequence (50,51). A later comparison 
suggested it might correspond to Pro-Cys (motif IV) in 5mC 
Mtases, even though the reaction mechanisms of the two families 
of Mtases appear to be quite distinct (52). The structural 
comparison of MJJhal and M.Taql has confirmed that the 
Pro-Cys and Asn-Pro-Pro-Tyr motifs of these two enzymes are 
spatially equivalent (12) and thus, by analogy, are referred to as 
motif IV ( 1 0). Motif IV has the consensus sequence Ser-Pro-Pro- 
Tyr for N4mC Mtases, Asp-Pro-Pro-Tyr for groups a and p 
N6mA Mtases and Asn-Pro-Pro-Tyr for group yN6mA Mtases 
(10,53,54). However, as we discuss below, Ser— >Asp— »Asn must 
not present an essential functional difference. We note that these 
consensus sequences are not absolute and there is still a problem 
in distinguishing N4mC from N6mA Mtases just on the basis of 
amino acid sequence (see Fig. 3). 

The flipped cytosine, taken from the M./#iaI-DNA structure, 
can be docked surprisingly well into the M.PvmD active site, 
located at the bottom of the V-shaped cleft. By superimposition 
of the common ot/p-sheet structures, the active site amino acids 
in MHhal from the catalytic P loop and strands p5 and (37 overlap 
the corresponding amino acids in M.Pvuft: Gly78-Phe-Pro-Cys81 
onto Ser53-Pro-Pro-Phe56 (P loop), Glull9 onto Asp96 (£5) 
and Argl65 onto Asnl58 ((J7) (Fig, 5a). In M.Hhal these amino 
acids interact with the target cytosine: Argl65 interacts with 02, 
Glul 19 with N3 and N4, the main chain carbonyl of Phe79 with 
N4 and Cys81 covalently bonds to C6. Though M.PvuU also 
interacts with cytosine, we do not observe the identical amino 
acids in the same structural elements in M.PvmII. However, as 
noted above, different amino acids are spatially equivalent in the 
two enzymes. 

One can easily model the interactions between the polar edge 
of the flipped cytosine and M.Pvull (shown in brown in Fig. 5a). 
The target atom, cytosine N4, would have two possible hydrogen 
bond partners: the hydroxy 1 group of Ser53 and the main chain 
carbonyl of Pro54 (the first two amino acids of the highly 
conserved motif IV). Also from this conserved motif, the phenyl 
ring of Phe56 could make van der Waals contacts with the 
cytosine ring; Phe56 occupies a position similar to Cys81 of 
M.Hka\. Asnl58 (p7), which does not appear to be conserved 
among the N4mC Mtases, might hydrogen bond to cytosine 02. 

Asp96 (Asn in most of the other N4mC Mtases) may hydrogen 
bond with and activate the Ser53 hydroxy 1 group 
(Asp96:052..Ser53:Oy = 2.7 A), thereby facilitating proton 



transfer from the cytosine amino group through the Ser and 
eventually to the Asp (Fig. 6a). If this occurs, the protonated 
Asp96 might then hydrogen bond to the N3 of the cytosine. Ser53 
and Asp96 thus appear to belong to a charge relay system 
analogous to that seen in the serine proteinases (55). 

Most importantly, the distance of the AdoMet methyl group to 
the cytosine N4 is -4 A in our docking model, sufficiently close 
to permit methyl group transfer. For comparison, in the structures 
of M.Hhal-DNA complexes the substrate cytosine C5-AdoMet 
methyl distance is -2.9 A (56); the product 5mC methyl-AdoHcy 
sulfur is also -2.9 A (4). Thus, our model suggests that methylation 
of the exocyclic amino group results from a direct attack of the 
activated cytosine N4 on the AdoMet methyl group, in analogy 
with the previously proposed mechanism for DNA adenine 
methylation (12,57,58). 

In the group P N4mC Mtases, Ser53 of M.PvwII is conserved 
except in M£amUl, which has Asp at this position (Fig. 3); a 
conserved Asp is present in the same place in the group P N6mA 
Mtases, as well as in the group a N6mA Mtases ( 1 0). Model ing 
suggests that Asp in this position of the P loop could interact with 
cytosine N4 and N3 (M.BamUT) or adenine N6 and Nl (Fig. 6b). 
The Asp carboxyl group could hydrogen bond with cytosine N4 
(NH2) or adenine N6 (NH2), thereby increasing the nucleophilic- 
ity of the nitrogen and serving as a trap for the ami no-leaving 
proton, when the methyl group transfers to the nitrogen from 
AdoMet. In that case the protonated carboxyl group could 
hydrogen bond with cytosine N3 or adenine Nl . If this is correct, 
the conserved Asp in MBamWl and the N6mA Mtases may be 
functionally comparable to Asp96 in M.PvuU. Ser53 in M.Pvull 
may compensate for the fact that Asp96 is too far from the 
cytosine N4 for direct interaction (Figs 5a and 6a), but this does 
not explain why most N4mC Mtases do not simply have Asp in 
place of Ser, as is seen in M.BamHl. 

When the structures of M.Pvull and M.Taql, a group yN6mA 
Mtase, are superimposed at their common rx/p-sheet structures, 
Asn 105 of Asn-Pro-Pro-Tyr (P loop) in M.Taql is present in 
place of Ser53 of Ser-Pro-Pro-Phe in M.fVwII; and two 
hydrophobic amino acids, Phel96 (loops 6-7) and Vail 63 (p5), 
of M.Taql replace the positions of two polar/charged groups 
(Asn 158 and Asp96) of M.PvwII. These hydrophobic amino 
acids, particularly PheI96, are likely to make van der Waals 
contacts with the target nucleotide ( 3 2). The carboxamide of M.Taql 
Asn 105 could interact with both adenine N6 and Nl (Fig. 6c), 
similar to the role Asn229 of thymidylate synthase plays in 
hydrogen bonding to dUMP (see figure 3 of 59). However, 
Asn229 of thymidylate synthase plays a contributory but 
non-essential role in catalysis (60). In contrast to Asp or Ser, it is 
unlikely that Asn can accept a proton. Therefore, the only role 
obvious at the present time played by the Asn of the group y 
N6mA Mtases is in positioning the substrate adenine, while the 
methylation would result from a direct attack of the AdoMet 
methyl group on the adenine N6 with a general base (possibly a 
highly ordered water molecule) assisting the proton transfer that 
occurs at N6. 

A second AdoMet molecule 

Some Mtases, including M.Pvull, appear to bind two molecules 
of AdoMet (34,61,62), one of which affects the selectivity of the 
protein towards substrate and non-specific DNA sequences (for 
M.£coDam; 6 1 ,62). Extra electron density (from 2F Q - F c , F Q - F c 



2712 Nucleic Acids Research, 1997, Vol 25, No. 14 



a 



b 




Figure 5. Active site, (a) Superimposition of the active sites in M./Vull (brown) and M.Hha\ (green). Amino acids shown are from the P loop (motif IV) and strands 
p5 (motif VI) and p7 (motif VIII) (see Fig. 2). In the complex between M.Hha\ and a transition state analog substrate, Cys81 is linked by a covalent bond (yellow) 
to C6 of the target cytosine (1 ). The cytosine is recognized by a number of hydrogen bonds (in white), (b) Close-up GRASP (71) representation displayed at the level 
of the solvent accessible surface. Color coded purple for positive (-20 K^T), red for negative (—20 K^T) and white for neutral, where K B is the Boltzmann's constant 
and T is the temperature. The AdoMet and the modeled target cytosine ring are in stick representation, with yellow for carbon, purple for nitrogen, red for oxygen, 
green for sulfur. The second AdoMet binding site is formed by the first AdoMet molecule at the bottom, Tyr299 and Pro247 on the left side and the main chains of 
Pro55 and Phe56 (between Ser53 and Arg60) on the right side. 
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Figure 6. Proposed reaction mechanisms for amino-Mtases with the P loop containing (a) Ser (as in MAwII), (b) Asp and (c) Asn. A general base (B;), which could 
be a water molecule, might be needed to eliminate the proton. 



and initial MAD-MIRAS maps) was found near the first AdoMet 
in molecule A. This may be a second AdoMet, as the density can 
be fitted well to an AdoMet adenosyl moiety with the methionine 
moiety extending into the solvent. This second AdoMet binding 
site is formed by the first AdoMet molecule at the bottom, Tyr299 
(otB) and His246-Pro247 (loop F-Z) on one side and the main 
chains of Pro55 and Phe56 (P loop) on the other (Fig. 5b). The 
adenine sits above the ribose ring of the first AdoMet. Most 
interestingly, the second AdoMet ribose oxygens interact with the 
side chain of Glu37 of crystallographic symmetry-related molecule 
B. This interaction, analogous to the first AdoMet-Glu294 (p2, 
motif II), may stabilize the second AdoMet in molecule A, due to 
the different crystal packing environment. 



However, despite the structural similarity of the AdoMet 
binding and active sites (Fig. 4b), this second AdoMet molecule 
does not occupy the active site (Fig. 5b). Instead, the second 
AdoMet occupies a space equivalent to the solvent channel in the 
M J/Zial-DNA structure, where a network of well-ordered water 
molecules, including that proposed as the general base for 
eliminating the C5 proton, mediates contacts between the target 
cytosine, AdoHcy and M.Hhal (see figure 3 of 4). 

Evolutionary relationships among the DNA Mtases 

As noted above, the structure of M.fVwII confirms two predicted 
features of DNA Mtase structure. First, all DNA Mtases 
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structurally characterized to date have AdoMet adenine binding 
pockets that are superimposable onto their methylatable base 
binding pockets (10; see Fig. 4b). Second, all DNA Mtases 
structurally characterized to date share a common a/p architecture 
for their catalytic domains, making different topological connections 
to accommodate the permuted linear orders of functional regions 
in their genes (see Fig. 2). 

These two features have implications for models of the 
evolutionary relationships among DNA Mtases. The interna] 
symmetry provided by the two binding pockets, each formed by 
a comparable set of cc helices and P strands, is suggestive of 
evolution by gene duplication (10). Subsequent gene fusion could 
have converted the resulting small molecule Mtase to a DNA 
Mtase by adding a TRD; some DNA Mtases are still produced in 
two separate pieces that associate to form active enzyme and one 
piece is essentially the TRD while the other is the catalytic domain 
(35,36). 

The second feature, common structure despite permuted gene 
orders, raises a question. Do the four groups of DNA Mtases (a, 
P, yand 5mC) represent divergence from a common ancestor or 
convergence from separate Mtase lineages? Matthews etai (63) 
have proposed a set of six criteria for distinguishing divergence 
from convergence: the DNA sequences of the genes should be 
similar, the amino acid sequences of the proteins should be 
similar, the three-dimensional structures should be similar, the 
enzyme-substrate interactions should be similar, the catalytic 
mechanisms should be similar and '...those segments of the 
polypeptide chain that are critical for catalysis are in the same 
sequence in the respective proteins (i.e. insertions and deletions 
are allowed, but not transpositions)'. There is as yet no structure 
for a Mtase of the a group, but Mtases from the other three groups 
(where the information is known) satisfy all except the last 
criterion (Fig. 2): the DNA Mtase groups have the major 
functional regions in three permuted gene orders (10). We can 
only note that several proteins have been found to remain 
structurally and functionally intact following circular permutation of 
their genes (64-68) and that genetic mechanisms for gene 
permutation have been proposed (69). Whether convergence or 
divergence describes the relationship between the DNA Mtases, 
it is clear that the N4mC Mtases such as M.Pvull do not represent 
a separate subfamily of enzymes. 

NOTE ADDED IN PROOF 

Since acceptance of this paper, the structure of an.AdoMet- 
dependent protein methyltransferase has been published (72). 
The Salmonella typhimurium CheR protein matches the consensus 
Mtase structure very well, including the binding of AdoHcy in the 
expected AdoMet pocket. 
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2. I am an expert in the field of synthetic chemistry and was an expert at the time of the 
invention. At the time of the invention I was employed as a group leader at Max-Planck- 
Gesellschaft zur Foerderung der Wissenschaften, assignee of the above-referenced patent 
application. Presently I am a Professor of Organic Chemistry at the RWTH Aachen 
(Rheinisch-Westfalische Technische Hochschule; Technical University of Aachen). My 
resume is attached as documentation of my credentials. 



3. I declare that one skilled in the art at the time of the invention using the teaching of the 
specification, including the exemplary protocols as set forth in Examples 1 and 2, pages 
19 to 30 of specification, and variations thereof, and other protocols known in the art at 
the time of the invention, could have successfully made and used the claimed compounds 
using only routine screening of alternatives. 
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As set forth in the specification, compounds of formula (I): 



(I) 




HO OH 



can be prepared by the following exemplary Reaction Scheme, in which Y is N, R is 
-NH(CH2>4NHR 4 , R 4 is dansyl, and R 2 is hydrogen (see, also, Reaction Scheme 6 on page 
14 of the specification): 
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In particular, reaction of S-bromo^^'-O-isopropylidene adenosine with 
1 ,4-diaminobutane yields the protected adenosine derivative with an aminolinker at the 8 
position (see, e.g., Compound 1.1 of Example 2 in the specification). Transient 
protection of the 5-hydroxy group with Si(CH3)3Cl, coupling of dansyl chloride with the 
primary amine of the aminolinker, and removal of the 5' hydroxyl protecting group leads 
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to the protected adenosine derivative with a fluorescent marker on the 8 position (see, 
e.g., Compound 1.2 of Example 2 in the specification). This intermediate is then reacted 
with mesylchloride to yield the mesylate derivative (see, e.g., Compound 1 .3 of Example 
2 in the specification). Removal of the isopropylidene protecting group under acidic 
conditions followed by reaction with aziridine affords a cofactor of formula (I). 

In a similar manner, other compounds of formula (I) wherein Y is N, as represented 
below as formula (la), can be readily synthesized from a compound of formula (Ha) as set 
forth below according to the teaching of Reaction Scheme 6 and Example 2 of the 
specification as set forth above. The compound of formula (Ila), Le., 8-bromoadenosine, 
was commercially available from Aldrich Co. at the time the above-identified patent 
application was filed. 




HO OH 



In particular, 8-bromoadenosine can be readily converted to 
8-bromo-2\3'-0-isopropylidene adenosine under procedures well known to one of 
ordinary skill in the organic chemistry field. The bromo substituent at the 8-position can 
then be replaced by a diamine, such as NH 2 (CH 2 )nNH2 (where n is 1-3 or 4-250) or 
NH2(C 2 H50) n C2H 5 NH2 (where n is 1-250), under known amination conditions to form 
the appropriate intermediate corresponding to the intermediate prepared from 1,4- 
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diaminobutane in Reaction Scheme 6. Such diamines would be considered by one skilled 
in the art to be homologues of NH2(CH2)4NH2 (1,4-diaminobutane) and, as such, would 
be expected to have comparable physiochemical properties to 1,4-diaminobutane in 
preparing the desired intermediates. The intermediates so prepared may then be treated 
with a compound of the formulae XC(0)R 4a or XS(0)2R 4a (where X is bromo or chloro 
and R 4a is the rest of the R 4 group) under standard acylation or sulfonylation conditions to 
prepared compounds of the invention where R 4 is attached to an aminolinker. R 4 is 
defined by the specification as being common modifiers for biological molecules and that 
representative R 4 groups can be fluorophores, affinity tags, crosslinking agents, 
chromophores, proteins, peptides, amino acids, nucleotides, nucleosides, nucleic acids, 
carbohydrates, lipids, PEG, transfection reagents, beads and intercalating agents. Most, if 
not all of the compounds that could be used to afford the R 4 group to the claimed 
compounds are known to be reactive to free amine groups, and therefore can be easily 
reacted with the intermediate so formed in order to arrive at a compound of the invention. 
For example, the compound providing the R 4 group for the compounds of the invention 
in Reaction Scheme 6 is dansyl chloride, thereby forming a compound of the invention 
where R 4 is dansyl. Other compounds providing the R 4 group may be similarly reacted 
under conditions known to one skilled in the art with the intermediate to form compounds 
of the invention wherein R 1 is -NH(CH 2 ) n NHR\ or -NH(C 2 H 5 0) n C2H 5 NHR 4 , and R 4 is 
other than dansyl. The 5-OH of the ribose moiety of the intermediate so formed can be 
activated with a good leaving group such as MsCl, as illustrated in Reaction Scheme 6, to 
form the corresponding -OMs group. Following deprotection of the hydroxyl groups of 
T and 3' position, 5'-OMs can then be replaced with aziridine to form a compound of 
formula (I). 

Likewise, compounds of formula (I) when Y is -CR\ as represented below as Formula 
(lb), can be readily synthesized from a compound of formula (lib) as set forth below. 
Compound of formula (lib), i.e., 7-iodo(or bromo)-7-deaza-adenosine, also called 5- 
iodotubercidin, is commercially available through Sigma-Aldrich. 
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HO OH 



The halogen substitute (Br or I) at the 7 position of the compound of formula (lib) is 
readily replaceable by diamines such as NH2(CH 2 ) n NH2 and NH2(C2H 5 0) n C2H 5 NH2 in a 
manner similar to that described above for compounds of formula (Ha), wherein Br at the 
8 position is replaced with such a diamine. Consequently, compounds of formula (lb) 
where R 3 is -NH(CH 2 ) n NHR 4 or -NH(C 2 H 5 0)nC2H5NHR 4 t can be readily prepared by one 
skilled in the art according to the teaching of the specification and procedures and 
reagents known at the time the above-identified patent application was filed. 

Furthermore, as one skilled in the art can readily appreciate, a compound of Formula (I) 
having a substituent on the aziridine ring can be synthesized according to Reaction 
Scheme 6 of the specification by replacing aziridine with an appropriately substituted 
aziridine. For example, when R 2 is -CH 2 CH(COOH)NH 2 , a suitably substituted aziridine 
precursor can be compound (4) as shown below: 




(4) 
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Compound (4) can be readily prepared from a corresponding epoxide according a 
synthetic methodology known to one skilled in the art at the time of the invention. As 
exemplified by the following reaction scheme, L-allyl glycine (1, commercially available 
from Fluka) can be first protected to afford compound 2 in an art known manner. The 
double bond in 2 is then oxidized to provide an epoxide (3) in the presence of m- 
chloroperbenzoic acid, a well-known oxidizing agent. The expoxide (3) is then converted 
to the substituted aziridine (4) under a known reaction condition involving NaN* and 
Ph 3 Ph (see, e.g., Legters I et al, Tetrahedron Lett. 1989, 30, 4881-4884). 



Once compound (4) is attached to the adenosine component of Formula (I) according to 
the last step in Reaction Scheme 6, the aziridine component can be deprotected following 
art-known methods to provide a compound of Formula (I) wherein R 2 is 
-CH 2 CH(COOH)NH 2 . 

4. I further declare that one skilled in the art could have used routine protocols known in the 
art at the time of the invention, including those described in the instant specification, to 
determine if any of the compounds of the invention acts as a co-factor for a S-Adenosyl- 
L-methionine (SAM) dependent methyltransferase. 

In particular, Examples 1.3 and 2.2 of the specification provides detailed descriptions for 
preparing the enzymes M-TaqI and M-Hhal, and conducting enzymatic reactions, all of 
which are routine laboratory procedures known to one skilled in the art. Accordingly, a 




m-chloroperbenzoic acid 
(MCPBA) 




(4) 
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screening protocol having general applicability based on these examples can be carried 
out in the following manner: The enzyme-catalyzed reaction can be carried out in a 
mixture (500 |Lil) of cofactor-free methyltransferase (5 nmol, 10 jiM), a suitable substrate 
to the particular methyltransferase (5 nmol, 10 pM), a compound of formula (I) (10 nmol, 
20 |iM), Tris acetate (20 mM, pH 6.0), potassium acetate (50 raM), magnesium acetate 
(10 mM) and Triton X-100 (0,01 %) at 37°C. The progress of the reaction can be 
monitored by anion exchange chromatography (Poros 10 HQ, 10 urn, 4,6 x 10 mm, 
PerSeptive Biosystems, Germany). The product (which is the result of the enzyme- 
catalyzed transfer of the compound of formula (I) to the substrate) can then be eluted 
with aqueous potassium chloride (0.2 M for 5 min, followed by a linear gradient to 0.5 M 
in 5 min and to 1 M in 30 min) in Tris hydrochloride buffer (10 mM, pH 7.0). 

Additionally, the specification, by way of detailed examples, provides three alternative 
means to analyze the product resulting from the transfer of a compound of formula (I) to 
a substrate in the presence of a suitable methyltransferase. 

First, according to Example 1.3.1 on page 20 of the specification, the product can be 
analyzed directly by reversed-phase HPLC-coupled electrospray ionization mass 
spectrometry. More specifically, RP-HPLC/ESI-MS can be performed with an ion-trap 
mass spectrometer (LCQ, Finnigan MAT, Germany) equipped with a micro HPLC 
system (M480 and M300, Gynkotek, Germany). The product can be purified by anion 
exchange chromatography, followed by desalting by repeated addition of water and 
ultrafilteration (Microsep 3K, Pall Filtron, Northborough, MA, USA). The product 
solution can then be injected onto a suitable capillary column (for example, Hypersil- 
ODS, 3 mm, 150 x 0.3 mm, LC Packings, Amsterdam, Netherlands) and eluted with a 
linear gradient of acetonitrile (7-10% in 10 min, followed by 10-70% in 30 min, 
150 |il/min) in triethylammonium acetate buffer (0.1 M, pH 7.0). The molecular weight 
obtained can be compared to the calculated molecular weight of the product. 
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Second, the product can be analyzed by electrospray ionization mass spectrometry using 
direct infusion according to Example 1.3.1 on page 21 of the specification. More 
specifically, a double focusing sector field mass spectrometer MAT 90 (Finnigan MAT, 
Germany) equipped with an ESI II electrospray ion source in the negative ion mode can 
be used. The desalted product in a aqueous solution and a liquid sheath flow (2- 
propanol) can then be delivered using a Harvard syringe pump (Harvard Apparatus, 
USA). The molecular weight of the product obtained from the electrospray mass spectra 
can then be compared to its calculated molecular weight. 

Third, for a product wherein the substrate is an oligodeoxynucleotide (or 
oligonucleotide), the product can be analyzed by mass spectroscopy following an initial 
step of enzymatic fragmentation, according to Example 1.3.1 on page 21 and Example 
2.2 on page 26. Specifically, a purified product of transferring a compound of formula (I) 
to an oligodeoxynucleotide (or oligonucleotide) can be dissolved in potassium phosphate 
buffer (10 mM, pH 7.0, 228 jul) containing magnesium chloride (10 mM), DNase I (2.7 
U), phosphodiesterase from Crotalus durissus (0.041 U), phosphodiesterase from calf 
spleen (0.055 U) and alkaline phosphatase (13.7 U) and incubated at 37°C for 20 h. An 
aliquot (100 ml) can be injected onto a reversed-phase HPLC column (Hypersil-ODS, 5 
mm, 120 A, 250 x 4.6 mm, Bischoff, Leonberg, Germany), and the products can be 
eluted with a gradient of acetonitrile (0-10.5% in 30 min followed by 10.5-28% in 10 min 
and 28-70% in 15 min, 1 ml/min) in triethylammonium acetate buffer (0.1 M, pH 7.0). 
Beside the deoxynucleosides dC, dA, dG, T, and dA Mc , a new compound eluted can be 
found. The new compound can be isolated and detected by ESI-MS (LCQ connected to a 
nanoelectrospray ion source, Finnigan MAT, Germany). The observed mass is then 
compared with the calculated molecular mass of the product that is expected to be 
obtained by transferring a compound of formula (I) to the oligodeoxynucleotide substrate. 

5. I declare that one skilled in the art could have used routine protocols known in the art at 
the time of the invention, including those described in the instant specification, to 
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determine if a putative methyltransferase could have complexed with an aziridine 
derivative of the present invention. 

It was known in the art at the time of the invention that SAM-dependent 
methyltransferases for diverse substrates such as DNA, RNA, protein, peptide and even 
small molecules, had common catalytic domain(s) for binding the SAM cofactor. As 
demonstrated above, the instant invention is directed to the discovery that compounds of 
formula (I) behave in substantially the same manner as SAM in the presence of two 
particular DNA methyltransferase, M-TaqI and M-HhaL See, for example, Reaction 
Scheme 7 of the specification. This result indicates to one skilled in the art that the 
compounds of the invention occupy the same catalytic domain of the DNA 
methyltransferases as SAM does, and would therefore function in the same manner for 
other SAM dependent methyltransferases due to the common catalytic domains of such 
enzymes. 

5. I hereby declare that all statements made herein of my own knowledge are true and that 
all statements made on information and belief are believed to be true; and further that 
these statements were made with the knowledge that willful false statements and the like 
so made are punishable by fine or imprisonment, or both, under Section 1001 of Title 1 8 
of the United States Code and that such willful false statements may jeopardize the 
validity of the application or any patent issued thereon. 



Date: 
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Elmar Weinhold 
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